n 

i AD- 


AO-A044  760 


UNCLASSIFIED 


ALFRED  P SLOAN  SCHOOL  OF  MANAOEMENT  CAMBRlOOE  MASS  C— ETC  F/0  12/1 
SET  decomposition:  cluster  analysis  and  graph  DECOMPOSITION  TEC— ETC (U) 
SEP  77  R C ANOREU  N00039-77-C-0255 

ClSR-POiO-01-01  NL 


Contract  No.  N00039-7 7-C-0255 


Internal  Report  No.  POlO-01-01 
Deliverable  No.  A002 


TECHNICAL  REPORT  #1 
SET  DECOMPOSITION:  CLUSTER  ANALYSIS 

AND  GRAPH  DECOMPOSITION  TECHNIQUES 


R.  C.  Andreu 


September,  1977 


- ^ 


Principal  Investigator: 
Prof.  Stuart  E.  Madnick. 


Prepared  for: 

Naval  Electronics  Systems  Command 
Washington,  D.C.  20360 


REPORT  DOCUMENTATION  PAGE 


s . y Ji-  — — 1 

Tocimical  J^i-port.  fl  • 


I :vF.Ai)  i,ssTR':crioNS 

liF.t-orv;-:  cr.vr*i,KT:.v  , r'lf-'A 

i GCVT  ACCESSION  SO. I i PEClPliiNT’i  OATA^^O  Si,MaEft 


T|T..  F.  •iJSi.-i.-lKe'; 

Set  decomposition:  Cluster  Analysis  and 
Graph  Decomposition  Techniques' 


, i.j  ijo._iYB8  at  BBaonT  a ^ewaP  covg^EO 

■arcj  .H-:  i-'v-i.'-  u. 


Au  T mO«  aj 


J Rafael  C./^indreu 


rcio-c 


N00039-77-C-0255/ 


9 PERFORM  su  C»3Ast2A-^  Os  same  as3  acD"E5S  , /' 

Center  for  Ir.forr.-ition  ^-^ysteai.^  Researen 
M.I.T.  Sloan  School  of  .’-'.anagemenc  - E53  - 330 

Cambridae,  MA,  02139 

10.  program  ElEmEn*'.  project,  task 

AREA  6 WORK  UNIT  SUMoERS 

11  COnTROw.-'n3  O^^ICE  name  asD  ADDRESS 

12.  REPORT  DATE. 

Naval  Electronic  Systems  Command  //  ,. 

September  1977 

Washington,  D.C.  20360 

nr  vjjMdER  OF  PACES 

14  monitoring  aGEN-T’,n  % ACQ.R&SSi'//  ditlerent  from  Controlling  Ottlce) 

15.  security  Class,  tot  thie  report; 

' t 

UNCLASSIFIED 

-- 

I5«  DECLASSIFICATION  DOWnCPADING 

schedule 

16  DISTRIBUTION  statement  ‘of  iht  a Report) 


Approved  for  public  release;  distribution  unlimited 


17.  distribution  S'^A'^EmEnT  'o!  the,  mbatrmct  entered  in  Block  20.  It  ditlerent  from  Report) 


19  KEY  WORDS  ^Continue  on  reverae  eide  it  neceeemry  and  Identity  by  6/ocA  number) 

Cluster  Analysis;  Graph  Decomposition;  Set  Decomposition;  Software 
Systems  Design. 


20  abstract  fContin-re  or,  reverae  eide  It  neceaaery  end  Idantlly  bv  bl  >ck  num^^r) 

The  investigation  of  a systematic  approach  for  the  early  phases  of  the 
system  development  process  generates  the  problem  of  deconoosing  a given  set 
in  which  interdependencies  have  been  defined  tunong  its  elements,  so  as  to 
obtain  a collection  of  subsets  as  free  of  interdependencies  as  possible. 
This  problem  is  jnalyzad  and  solutions  proposed;  oluster  analysis  and 
graph  decomposition  techniques  are  applied  and  shown  to  possess  some  simi- 
larities wnich  .illow  to  approach  set  decomtxjsition  problems  within  a 
unified  framework. . ^ 


I JAN  73 


EDIT  ^N  Ok  I NOV  69  IS  DBSOLCTC 
S N 0 I0;-0 I4«  660 i 


security  CU  ASSiEiC  ATION  TmiS  PAGE  £'Af#  Knteradj 


The  Center  for  Information  Systems  Research  (CISR)  is  a research  center 
located  and  managed  in  M.I.T.'s  Sloan  School  of  Management;  it  consists  of  a 
group  of  Management  Information  Systems  specialists,  including  faculty  members, 
full-time  research  staff  and  part-time  students.  The  Center’s  general  research 
thrust  is  to  devise  better  moans  for  designing,  generating  and  maintaining 
applications  software,  information  systems  and  decision  support  systems. 


Within  the  context  of  the  research  effort  sponsored  by  the  Naval  Electronics 
Systems  Command  under  contract  N00039-77-C-0255 , CISR  proposed  to  conduct  basic 
research  on  a systematic  approach  to  the  early  phases  of  complex  software 
systems  design,  one  of  the  main  goals  being  the  development  of  a well  defined 
methodology  aimed  at  explicitly  filling  the  gap  between  system  requirements 
and  program  specifications  that  characterizes  most  traditional  system  design 
strategies.  At  the  heart  of  such  a methodology  is  the  structuring  of  fhe 
initial  set  of  requirements  so  as  to  make  apparent  the  design  trade-offs  existing 
among  its  elements;  the  decomposition  of  that  set  into  subsets  of  strongly 
interdependent  requirements  which  would  define  a meaningful  framework  for  system 
design  is  the  main  focus  of  the  proposed  methodology.  The  research  project  is 
organized  so  as  to  investigate  the  following  four  areas: 

1)  Graph-like  representation  of  requirements  sets  and  suitable  decomposition 
techniques , 

2)  Design  and  development  of  a set  of  software  tools  to  support  the  set 
decomposition  activity, 

3)  Identification  of  a methodology  for  the  assessment  of  interdependencies 
among  requirements,  as  well  as  guidelines  for  the  interpretation  of  the 
obtained  decompositions  and  for  the  coordination  of  design  subproblems, 
and 

4)  Experimental  application  of  the  methodology  and  supporting  tools  to  a 
specific  case,  with  emphasis  on  recommendadtions  for  their  practical  use 
and  comparison  with  more  traditional  design  approaches. 


This  document  focuses  on  the  activities  carried  out  at  CISR  to  investigate 


the  first  area  outlined  above. 


CONTRACT  N00039-77-C-0255 
Technical  Report 

EXECUTIVE  SUMMARY 

The  main  thrust  of  this  research  is  to  explore  ways  in  which  to  bring  more 
structure  to  the  early  stages  of  the  software  system  design  process.  These  stages 
are  concerned  with  resolving  trade  - offs  among  system  requirements  so  as  to  identify 
a collection  of  design  subproblems  which  can  be  easily  coordinated  in  the  context 
of  the  overall  design.  The  approach  taken  has  been  to  make  the  trade  - offs  among 
requirements  explicit  and  to  organize  the  requirements'  set  in  a graph  - like 
fashion,  where  nodes  correspond  to  requirements  and  links  to  what  we  call  "interdepen- 
dencies" among  them.  Strongly  interdependent  requirements  should  be  considered 
at  the  samj  time  for  design  purposes;  subsets  of  strongly  interdependent  requirements 
can  be  thought  of  as  defining  a design  subproblem.  Consequently,  one  of  the  main 
problems  that  this  research  is  concerned  with  is  the  following: 

Given  a sot  of  objects  (requirements)  in  which  interdependencies  have  been 
defined,  identify  subsets  of  objects  such  that: 

(a)  Elements  in  the  same  subset  are  strongly  interdependent,  and 

(b)  Elements  in  different  subsets  are  as  free  of  interdependencies  as  possible. 

Several  approaches  to  this  problem  are  analyzed  and  compared  in  this  report. 

It  is  found  that  cluster  analysis  techniques  constitute  a good  strategy  for 
solving  it.  Furthermore,  a number  of  interesting  correspondencies  between 
cluster  analysis  and  graph  decomposition  techniques  are  identified  which  provide 
a unified  framework  where  decomposition  problems  can  be  analyzed.  One  of  these 
correspondencies  suggests  a new  clustering  algorithm  which  requires  less 
a priori  parameters  than  other  more  traditional  ones.  Experimental  results 
with  this  algorithm  as  applied  to  the  problem  outlined  above  will  be  included 
in  a future  report. 


TABLE  OF  CONTENTS 


1. -  Introduction 1 

2. -  The  Basic  Cluster  Analysis  Framework 3 

3. -  Characteristics  of  Dissimilarity  Measures 5 

4. -  Basic  Cluster  Analysis  Approaches 8 

4.1. -  Agglomerative  Techniques 9 

4.2. -  Partitioning  Techniques 12 

5. -  Graph  Decomposition  Problems  And  Techniques;  Similarities  To 

Cluster  Analysis 14 

5.1. -  Graph  Decomposition  Techniques 15 

5.1.1. -  A Heuristic  Approach 17 

5. 1.1.1. -  Comments  On  A Possible  Generalization 

Of  The  Core  Set  Concept 24 

5.2. -  Putting  The  Graph  Decomposition  Problem  In  The  Cluster 

Analysis  Framework 27 

5.2.1. -  "Minimum  Path"  Dissimilarity  Measures 27 

5.2.2. -  "Connectivity"  Dissimilarity  Measures 29 

5. 2. 2.1. -  An  Alternative  Approach  To  Compute 

Distance  Matrices  When  Preclustering 

Takes  Place 34 

5.3. -  Grapli  Decomposition  Techniques  Applied  To  Cluster  Analysis..  38 

5.3.1. -  A Strategy  For  Constructing  Initial  Partitions 

In  Non-Agglomerative  Cluster  Analysis 40 

5.3.2. -  Working  With  The  Similarity  Matrix  As  A Whole 

Prior  To  Applying  Clustering  Algorithms; 

Normalization  Of  Distance  Matrices 41 

5. 3. 2.1. -  Iterative  Computation  Of  Distance 

Matrices 45 

5.3.3. -  The  Strategy  Of  Section  5.3.1  Revised 56 

6. -  Other  Approaches  And  Problems 62 

7. -  Summary  And  Implications 63 


APPENDICIES 

Appendix  I I-l 

Appendix  II II-l 

Appendix  III III-l 

Appendix  IV. 


Iv-1 


SET  DECOMPOSITION;  CLUSTER  ANALYSIS  AND  GRAPH  DECOMPOSITION  TECHNIQUES 


1.-  Introduction 

The  purpose  of  this  report  is  to  survey  a number  of  techniques  that 
address  the  problem  of  decomposing  a given  set,  in  which  "interdependencies" 
among  its  elements  have  been  defined,  into  a collection  of  subsets  characteri- 
zed by: 

a)  Strong  interdependencies  among  elements  of  a given  subset,  and 

b)  Weak  interdependencies  Eimong  elements  of  different  subsets. 

The  motivation  for  this  survey  is  discussed  in  [Andreu  & Madnick  7?j , 

where  decomposing  a set  of  system  requirements  into  subsets  of  the  characteristics 
outlined  above  is  presented  as  a means  towards  the  end  of  developing  a systematic 
methodology  applicable  to  the  system  development  phase  called  "preliminary  design" 
or  "architectural  design"  (see  [Freeman  76]). 

The  emphasis  of  the  discussion  will  be  both  on  techniques  proposed  to 
solve  the  decomposition  problem  and  on  the  nature  of  the  interdependencies  among 
set  elements  assumed  by  the  different  techniques.  Since  one  of  the  goals  of  our 
research  is  to  investigate  ways  of  assessing  interdependencies  among  the  requirements 
established  for  a given  system,  it  is  appropriate  to  consider  how  similar  interde- 
pendencies have  been  treated  in  the  past  in  order  to  gain  insight  into  how  can  we 
approach  ours. 

The  paper  is  organized  as  follows: 

Section  2 describes  the  basic  framework  in  which  most  of  the  so  called 
"cluster  analysis"  techniques  are  organized. 

Section  3 discusses  how  cluster  analysis  problems  are  typically  defined 
In  that  framework. 

Section  4 describes  two  basic  approaches  to  cluster  analysis  and  discusses 


some  of  their  pros  and  cons. 


-2- 


Section  5 focuses  on  a decomposition  problem  that  doesn't  quite  fit 
the  framework  of  cluster  analysis  but  which  is  relevant  for  our  purposes 
because  (i)  it  represents  a starting  point  for  the  formulation  of  our  problem, 
(ii)  it  can  be  transformed  to  fit  the  framework,  and  (iii)  it  provides  insight 
as  to  how  some  "clustering  algorithms"  can  be  improved. 

Section  6 briefly  reviews  a few  techniques  appropriate  only  in  very 
special  cases,  for  completeness. 

Section  7,  finally,  discusses  some  practical  implications  of  sections 
4,  5 and  6,  and  proposes  a strategy  for  choosing  a decomposition  technique  given 
the  characteristics  of  the  problem  at  hand. 


- 3 


2.-  The  basic  cluster  analysis  framework. 


Tl:e  purpose  of  cluster  analysis,  as  defined  by  Hartigan  ([Hartigan  7s] ) 
is  to  "group  similar  objects".  Whatever  the  technique  employed,  this  definition 
implies  that  information  is  available  regarding  the  extent  to  which  objects  in 
the  initial  set  are  similar  or  dissimilar.  Most  cluster  analysis  techniques 
assume  that  such  information  is  available  in  the  following  way: 


Let  O: {o^. 


^i' 


. , o^}  be  the  initial  set  of  (n)  objects 


in  whi 


sters  ("groups  of  similar  objects")  are  to  be  identified, 
n object  is  characterized  by  a set  of  "attributes", 


X:  {x^,  . . . , x^ , 


X } , measured  in  some  scale  (s). 
m 


Thus,  an  object  o^eO  is  characterized  by  a vector 


• • * f ^ r • • • > 

j.  11  13  im 


(1) 


In  this  sense,  is  a representation  of  object  o^  for  the  purposes 
of  the  analysis. 

Cluster  analysis  algorithms,  as  discussed  in  the  next  section,  assume 


that  "similarity"  or  "dissimilarity"  measures  between  any  pair  of  objects 
o^,  Oj  e O are  available.  These  measures  are  computed  from  the  objects'  repre- 
sentation X^;  several  ways  of  doing  so  have  been  proposed,  as  it  is  pointed 
out  briefly  in  the  next  section. 

From  a broader  perspective,  in  the  context  of  a set  of  objects  0 
represented  by  a set  of  vectors  X^  of  the  form  (1) , there  are  two  things  that 
can  be  "clustered" : 

a)  Objects  (i.e.,  identifying  groups  of  similar  objects,  the  objective 


of  cluster  analysis),  and 


4 


b)  Attributes  (i.e.,  identifying  groups  of  attributes  that  define  the 
"main  components"  of  vectors  representing  objects  in  some  set,  the  realm  of 
"factor  analysis") . 

In  this  paper,  we  focus  on  techniques  that  address  the  first  of  these 
problems,  although  in  the  final  section  we  suggest  that  the  second  can  be  rele- 
vant for  our  purposes  in  order  to  give  meaning  to  the  subsets  identified  in  a 
decomposition  process. 


4 


■4 


Size  of  range  set 


Scale 

measurement 

Continuous : Hay 
assume  an  un- 
countably  infi- 
nite number  of 
values 

Discrete:  May 
assume  a finite 
(at  most  counta- 
bly infinite) 
number  of  values 

Binary:  May 
assume  only 
two  values 

Ratio:  If  x^  > x^, 

A is  times 

greater  than  E and 
x,-x  units  greater 
than  B. 

Temperature 
in  * K,  weight , 
height,  age 

Counts  such  as 
number  of  cars, 
persons,  etc. 

Unit  price  of 
drin)cs  in  vending 
machines,  e.g. : 
CH.'S,  lot ; cans:  15t 

Interval:  If 
\>  A is 

X,  - x„  units  grea- 
ter  than  B. 

Temperature 
in  "C,  speci- 
fic gravity 

Serial  numbers, 
TV  channel  nos. 

How  many  wives  do 
you  have?  (only 
0 or  1) . 

Ordinal:  Either 
or  x^<  Xg. 

— 

Human  judgements 
of  texture,  etc. 

Military  ran)<, 
(wide,  medium, 
narrow) , etc. 

Tall- short, 
good-bad,  etc. 

Nominal:  Either 

“a’-B 

. 

Eye  color,  place 
of  birth,  etc. 

Yes-no,  on-off, 
true-false. 

Table  1:  Cross  - classification  of  attributes  with  examples. 


For  the  purpose  of  computing  the  dissimilarity  matrix  S,  scale  con- 
versions may  be  needed  prior  to  combining  the  representations  of  any  pair  of 

objects  X.  and  X,  into  an  S entry  of  the  form  s..  = f(X.,  X.).  Scale  conversion 
i 1 1]  1 : 

techniques  applicable  to  the  attributes  listed  in  Tcible  1 are  available  (see 


fAnderbcrg  73]).  The  specific  strategy  employed  to  compute  S is  of  no  central 
concern  to  our  discussion  here;  it  will  suffice  to  say  that  most  cluster  analysis 
techniques  assume  that  the  dissimilarity  coefficients  are  "metrics",  one  type 
of  distance  functions.  Several  such  functions  are  discussed  in  [Anderberg  73] . 
Metrics  are  characterized  by  the  following  properties; 

1)  s.  . = 0 iff  i = j. 

2)  s^.  > 0 Vi,j. 

3)  s . . = s . . Vi, j . 

4) s..<s,,  +s,  . 

X3  — xk  k] 

A distance  function  having  properties  1,  2 and  3 but  not  property  A 
is  called  a "semimetric".  If,  on  the  other  hand,  property  4 is  replaced  by 
4')  £max(s^j^,  s^^J  Vi,j,k, 

a distance  function  with  properties  1,  2,  3 and  4’  is  called  an  "ultrametric", 
since  4’  is  considerably  stronger  than  4. 


<1 


- 8 


4.-  Basic  cluster  analysis  approaches. 


Once  a dissimilarity  matrix  S(s^j)  of  "distances"  between  pairs  of 
objects  in  a set  O is  available,  there  are  two  main  approaches,  discussed  below, 
that  make  use  of  the  information  in  that  matrix  to  identify  clusters,  i.e.,  subsets 
of  O characterized  by  the  fact  that  their  elements  are  similar,  or  "close  together", 
in  some  sense.  From  a slightly  different  perspective,  clusters  can  be  viewed  as 
groups  of  objects  such  that  the  members  of  a given  one  are  closer  to  any  of  the 
objects  in  the  same  cluster  than  to  some  other  object  in  another  cluster.  This 
view  is  at  the  root  of  many  cluster  analysis  algorithms. 

The  concept  of  cluster  as  decribed  above  is  not  very  precise,  in  the 
sense  that  no  concrete  characterization  of  cluster  can  be  derived  from  it.  For 
example,  how  close  must  an  object  be  to  the  members  of  a given  cluster  for  it  to 
be  considered  an  element  of  the  same  cluster  (as  opposed  to  being  assigned  to 
a new  cluster?  This  question  hinges  on  two  related  aspects  of  the  cluster 
analysis  methodology  as  it  exists  today,  i.e.:  (i)  into  how  many  subsets  (clusters) 
should  the  original  set  O be  decomposed?,  and  (ii)  how  coherent  should  these 
clusters  be?  There  are  two  trivial  answers  to  these  questions; 


-i 


9 - 


I 


(i)  The  minimum  number  of  clusters  that  may  result  is  1,  the  entire 
set  O;  the  maximum  is  n^,  n clusters  with  one  object  each. 

(ii)  Minimum  cluster  coherence  is  achieved  with  only  one  cluster 
(assuming  that  the  set  O has  more  than  one  element) ; maximum  cluster  coherence 
is  achieved  with  n clusters  of  one  object  each. 


Of  course,  neither  extreme  is  meaningful.  Cluster  analysis  is  concerned 
with  some  middle  ground  solution.  Several  parameters  can  be  used  to  characterize 
such  a solution.  For  example,  we  can  say  that  the  distance  between  two  elements 
in  any  given  cluster  should  not  be  greater  than  a given  amount,  or  that  we  wish 
to  obtain  a certain  number  of  clusters.  Alternatively,  measures  can  be  developed 
to  evaluate,  globally,  a given  decomposition  of  the  set  0,  and  algorithms  devised 
to  optimize  them. 

The  cluster  analysis  techniques  currently  available  can  be  classified 
in  two  main  families.  They  work  towards  a middle  ground  solution  starting  at  one 
of  the  two  extremes  just  mentioned.  We  discuss  them  briefly  below. 

4.1.-  Agglomerative  techniques. 

The  techniques  in  one  of  these  feunilies  are  generically  called 
"agglomerative".  Tijey  start  with  a set  of  n one-member  clusters  and  try  to  reduce 
the  number  of  clusters  as  dictated  by  some  meaningful  criteria.  Typically,  these 
techniques  go  all  the  way  until  the  number  of  clusters  is  reduced  to  1 (the 


entire  set  O) . The  order  in  which  elements  are  assigned  to  clusters  that  will 


10  - 


eventually  merge  into  the  complete  original  set  is  then  used  to  identify  a 
reasonable  set  of  clusters.  The  general  structure  of  this  family  of  techniques 
is  as  follows: 


1. -  Begin  with  n clusters,  each  containing  one  object.  Let  the  clusters  be 

labelled  with  the  numbers  1 through  n. 

2. -  Search  the  dissimilarity  matrix  for  the  most  similar  pair  of  clusters. 

Let  the  choosen  clusters  be  p and  q. 

3. -  Reduce  the  number  of  clusters  by  1 by  merging  clusters  p and  q.  Label 

the  product  of  the  merger  q (say  q < p by  convention) . Update  the 
dissimilarity  matrix  to  reflect  the  revised  dissimilarity  coefficients 
between  the  new  cluster  q and  all  other  existing  clusters.  (Note,  the 
matrix  now  contains  distances  between  clusters , not  between  objects) . 

Delete  the  row  and  column  pertaining  to  cluster  p. 

4. -  If  the  current  number  of  clusters  is  greater  than  1,  go  to  2.  Otherwise 

stop:  all  objects  are  in  one  cluster. 

Several  techniques  in  this  family  can  be  developed  by  varying  the 
procedures  used  for  defining  the  most  similar  pair  of  clusters  at  step  2 and 
for  updating  the  matrix  at  step  3.  (Note  that  distances  between  clusters  are 
not  as  well  defined  as  distances  between  objects  are:  for  example,  they  can  be 
defined  as  the  maximum,  or  the  minimum,  distance  between  any  pair  of  objects,  one 
in  each  cluster;  or  as  the  distance  Jae tween  the  so  called  "centroids"  of  the  two 
clusters,  etc.) 

Agglomerative  techniques  of  this  )tind  are  appealing  for  the  following 

reasons : 

- In  a sense,  they  scan  the  ground  between  the  two  extreme  solutions  mentioned 
above  in  such  a way  that  when  the  process  is  complete  the  analyst  can  decide  upon  an 
appropriate  middle  ground  solution  (see  [choffray  77]  for  a method  to  ma)ce  such 


11 


a decision).  For  excunple,  consider  Fig.  1.  The  objects  represented  as  points 
there  could  be  succesively  clustered  as  shown  (i.e.,  first  A with  B,  then  this 
cluster  with  C,  etc.).  Cluster  "5"  is  the  complete  set.  But  clusters  "2"  and  "4", 
the  only  ones  existing  just  prior  to  the  merger  that  produces  "5",  form  possibly 
(and  intuitively)  the  best  set  of  clusters  for  this  case. 


Fig.  1 

- TTiey  explicitly  display  a hierarchical  arrangement  of  clusters  that  can 
be  useful  to  interpret  the  chosen  partition. 

- They  scan  the  ground  between  the  two  extremes  without  enumerating  all 
the  possible  clusters,  so  that  they  tend  to  be  fast. 

On  the  other  hand,  they  ma)ce  early  decisions  that  are  never  reconsi- 
dered and  which  may  not  be  appropriate.  For  example,  consider  Fig.  2 which 
represents  a two-dimensional  case  (i.e.,  two  attributes)  that  can  be  drawn  on 
the  plane.  In  the  circumstances  of  Fig.  2,  objects  A and  B,  the  closest  pair. 


Fig.  2 


would  be  assigned  to  the  same  cluster  at  the  very  first  step,  but  this  is 
intuitively  wrong  from  looking  at  the  plot, 

4.2.-  Partitioning  techniques. 

The  other  generic  family  of  cluster  analysis  techniques  can  be  called 
"partitioning"  techniques.  The  idea  behind  them  is  to  start  with  the  complete 
set  of  objects  as  a single  cluster  and  then  proceed  by  partitioning  it  into  subsets 
by  applying  a series  of  rules  whose  goal  is  to  improve  the  current  partition.  For 
this  purpose,  some  kind  of  "objective  function"  is  used,  that  is  the  counterpart 
of  the  function  employed  to  decide  what  clusters  to  merge  next  in  the  agglomerative 
techniques  discussed  above.  There  are  two  main  subfcunilies  of  partitioning  tech- 
niques: 

i 

I 


13  - 


(a)  "True"  partitioning  techniques,  which  start  off  with  the  entire 
set  of  objects  as  a single  cluster  and  keep  partitioning  it  while  an  objective 
function  keeps  imj roving.  The  way  in  which  succesive  partitions  are  generate! 
is  not  always  "general"  in  the  sense  that  not  all  possible  partitions  are 
considered  for  adoption. 

(b)  "Initial  partition"  techniques,  which  proceed  as  follows:  (1)  An 

eventual  number  of  clusters  k is  decided  upon  a priori , (2)  either  a k-partition 
or  k so  called  "leader"  objects  — each  corresponding  to  one  of  the  eventual 
clusters — are  identified,  and  (3)  objects  are  assigned  or  re-assigned  to  clusters 
while  improvement  results.  Several  methods  for  identifying  initial  partitions 

or  leader  objects  have  been  proposed,  but  no  one  is  recognized  as  the  absolute 
best. 

Techniques  in  this  family  are  in  general  more  iterative  (and  thus  more 
time  consuming)  than  agglomerative  techniques;  on  the  other  hand,  they  tend  to 
avoid  the  main  problem  that  characterizes  agglomerative  techniques:  since  early 
decisions  regarding  which  object  goes  to  what  cluster  may  be  revised  as  the 
algorithm  proceeds,  they  have  a chance  to  be  changed  if  this  seems  appropriate. 

The  central  idea  driving  most  of  these  techniques  can  be  expressed  in 
a mathematical  programming  framework;  however,  solving  the  resulting  optimization 
problem  is  often  impractical  because  of  its  characteristics  (large  size,  binary 
decision  varicibles,  non-linearity).  An  example  of  this  situation  is  presented 


in  the  next  section. 


14  - 


5.-  Graph  decomposition  problems  and  techniques;  similarities  to  cluster  analysis. 

While  the  decomposition  techniques  outlined  in  the  preceding  sections 
work  with  a "dissimilarity  matrix"  whose  entries  meet  certain  conditions  (basi- 
cally they  are  assumed  to  be  metrics) , there  is  a general  form  of  decomposition 
problem  that  does  not  meet  these  conditions  but  which  is  of  interest  to  us: 
Consider  a situation  where  pairs  of  objects  in  the  initial  set  O are  known  to  be 
related  in  one  of  two  ways,  e.g.,  they  are  either  related  or  unrelated. 

If  we  try  to  put  this  problem  in  the  framework  described  in  sections 
2 and  3,  we  immediately  think  of  a distance  matrix  whose  entries  are  of  only  two 
types:  given  a pair  of  objects,  they  are  either  "close"  (related)  or  not,  so 
that  matrix  entries  can  take  only  two  possible  values.  Assume  we  assign  a value 
of  1^  to  the  entries  corresponding  to  pairs  of  related  objects  and  a value  of  0^ 
to  those  corresponding  to  pairs  of  unrelated  objects.  A matrix  of  this  form  would 
be  a "similarity"  matrix  rather  than  a dissimilarity  one.  Switching  the  assignment 
strategy  for  I's  and  O's  would  result  in  a matrix  more  like  a dissimilarity 
matrix,  but  its  entries  would  not  meet,  in  general,  the  conditions  needed  for 
them  to  be  metrics  (see,  for  instance,  the  matrix  in  Fig.  3:  s^^  ^ ^^3  ^ ^32^’ 


Tliis  means  that  such  a matrix  cannot  be  used  in  the  cluster  analysis 


15  - 


techniques  discussed  above.  Nevertheless,  we  have  intuitive  feelings  adaout  how 
the  decomixasition  problem  should  be  attacked  in  this  case:  The  eventual  clusters 
should  be  groups  of  objects  whose  elements  are  strongly  pairwise  related,  while 
elements  in  different  subsets  should  have  few  pairwise  relationships.  This  can 
be  formalized  and  a strategy  devised  for  decomposing  the  initial  set  of  objects 
into  subsets  with  these  characteristics.  Moreover,  it  is  possible  to  derive 
dissimilarity  matrices  from  the  initial  binary  matrix  in  such  a way  that  the  re- 
sulting entries  are  metrics.  ITiis  means  that  we  can  look  at  this  problem  from 
either  the  point  of  view  of  traditional  cluster  analysis  or  from  that  of  methodo- 
logies appropriate  for  binary  matrices;  as  it  turns  out,  this  dual  possibility 
suggests  ways  of  improving  techniques  in  both  settings. 

In  this  section  we  discuss  the  binary  matrix  case  with  the  convention 
that  I's  are  assigned  to  pairs  of  related  objects.  This  convention  permits  the 
representation  of  the  initial  set  of  objects  as  a graph  whose  "adjacency  matrix" 

(see  [Oeo  74] ) corresponds  to  the  similarity  matrix  in  the  terminology  above. 

Two  ways  of  solving  this  decomposition  problem  will  be  briefly  descri- 
bed; two  strategies  for  deriving  a similarity  matrix  meeting  the  metric  condi- 
tions will  be  proposed,  and  the  decomposition  techniques  applicable  to  the  latter 
compared  with  those  suitable  for  the  former  — both  can  benefit  from  the  comparison. 

5.1.-  Graph  decomposition  techniques. 

Given  a set  of  objects  whose  interdependencies  are  characterized  by  a binary 


similarity  matrix,  the  decomposition  problem  becomes  a graph  decomposition  problem. 
As  described  in  fAndreu  & Madnick  77] , a measure  M can  be  developed  that  evaluates 
how  "good"  a partition  of  the  original  set  into  subsets  is.  This  measure  is  such 


16  - 


that  its  value  is  higher  the  better  the  partition,  so  that  it  should  be  maximi- 
zed across  all  possible  partitions. 

If  the  eventual  number  of  subsets  is  set  a priori  (as  in  many  non-agglo- 
merative  clustering  techniques) , the  problem  can  be  formalized  as  a non-line<  r 
integer  programming  problem,  as  follows. 

Let; 

- S(s^j)  be  the  (n  x n)  similarity  (graph  adjacency)  matrix,  i.e.: 

s.  . = 1 iff  objects  o.  and  o,  are  related, 

ID  11 

(for  the  purposes  of  the  formulation  below,  we  assume  that  s. . = 0 if 

11 

i=j,  although  this  is  counterintuitive  and  will  be  changed  later) 

- K be  the  eventual  number  of  clusters  (set  £ priori ) , 

- (k=l,  ...,  K;  i=l,  ...,  n)  be  a set  of  binary  decision  variables; 

is  set  to  1 if  object  o^  is  assigned  to  cluster  k,  to  0 otherwise. 

~ M(g,,,  ...,  g ) be  the  value  of  the  measure  M corresp>onding  to  the  parti- 
11  Kn 

tion  defined  by  gj^^  (k=l,  ...,  K;  i=l,  ...,  n)  . 

Then  the  decomposition  problem  can  be  written  as: 

”(9ll W 

s.t. : 

E E g^^^  = n (to  ensure  that  each  object  is  assigned  to  one  and 
k i only  one  cluster) 


1 ^ > 0,  gj^^  integers,  k=l,  ...,  K;  i=l,  ...,  n, 

where  M can  be  expressed  as  (see  [Andreu  & Madnick  77j ) : 


K ^ ? ^ki^kj^ij 

M=  2 I 


^ki  ^ ^ K K ^ I 


k.i 


I I 


k=l  k*  = l (E  g ) (E  g ) 

k*Kk 


17 


This  formulation  is  impractical  given  the  characteristics  of  the 
resulting  problem:  non-linearity  and  a large  number  of  binary  decision  variables 
(nK) . Although  it  conveys  the  general  structure  of  the  graph  decomposition 
problem,  it  suggests  that  heuristic  approaches  can  be  more  appropriate.  In  the 
next  section  we  explore  one  such  approach  which  is  also  useful  to  identify  a 
way  for  putting  the  graph  problem  in  the  framework  of  cluster  analysis. 


I 


5.1.  L-  A heuristic  approach. 

In  this  subsection  we  describe  a heuristic  approach  that  has  proven 
effective  in  a few  graph  decomposition  problems  where  we  have  actually  employed 
it.  It  is  "subgraph  strength"  driven,  in  the  sense  that  it  attempts  to  identify 
the  "core"  of  subsets  of  nodes  likely  to  have  high  "strength"  (see  fAndreu  & 
Madnick  77j  for  a definition  of  this  term) . 

For  exposition  purposes,  the  following  terminolgy  will  be  used: 

Let  the  pair  (0,A)  represent  a graph: 

- O: {o^,  i=l,  — , n}  , the  set  of  (n)  nodes, 

- A:{a. .,  a, ,=1  if  nodes  o.  and  o.  are  related,  =0  otherwise}  ; here  we 

11  11  11  

(*) 

assume  a. .=1  when  i=j 
U 

We  define: 

- The  "core  set"  CS,  associated  with  a node  o.  to  be  the  set 

i 1 

CS. :{o.  1 o.  s.t.  a, .=1}  , i.e.,  the  set  of  all  nodes  related  to  o. , 

i 1 ' j 11  1 

including  o^  itself,  and 

- Itie  "connectivity"  of  node  o^  to  be 

c^  = |CS^|  - 1,  where  by  [xj  we  mean  the  dimension  of  set  X. 

_ 

If  we  think  of  A as  the  graph  adjacency  matrix,  this  would  mean  that  all  the 
nodes  have  "self- loops"  (CDeo  74j);  in  our  context  this  is  intuitively  correct: 
a node  is  related  to  itself. 


18  - 


Intuitively,  we  are  interested  in  nodes  C 0 with  high  connectivity 
and  whose  associated  cores  sets  do  not  interfere  too  much  with  one  another,  since 
those  are  likely  to  be  at  the  kernel  of  subsets  of  nodes  whose  elements  are 
strongly  related.  Once  a number  of  such  "kernel  subsets"  are  identified,  the 
remaining  nodes  can  be  assigned  to  the  subsets  in  which  they  best  "fit",  according 
to  some  criteria  consistent  with  the  maximization  of  the  overall  measure  M (see 
section  5.1.1) . 

Following  this  intuition,  the  identification  of  kernel  subsets  can  be 
done  iteratively  using  the  procedure  specified  below: 

0)  Set  J=0. 

1)  Compute  c^  Vo^  e 0.  If  c^  = c^  ^i»j»  set  J=J+1;  KESU (J)  = 0;  stop. 

2)  Consider  the  k (>  1,  a number  specified  ^ priori ; see  the  end  of  this 
section  for  considerations  about  its  value)  nodes  with  highest  c..  Without 
loss  of  generality,  assume  that  these  are  the  nodes  o , ...,  o . 

X iC 

3)  Determine  CS.  for  o,  C {o, , ....  o,  }. 

X 1 1 k 


4)  Compute  KS . = (CS.nfU  CS.])  ^o.e{o,, 

1 X , i-*  X 1 

j=l 

5)  Select  o £{o, , ...,  o,  } such  that 

pi  k 

KS  = min  ( IkS,  1) 

i=l,  ...,  k ^ 

6)  Set  J=J+1; 

If  IkS^I  = IcSpI,  set  KESU(J)  = 0 and  stop, 

else  set  KESU(J)  = o U [CS  - KS  ] . 

P P P 

7)  Recompute  0: 

0=0-  KESU(J);  if  |o|  = 0,  stop. 

8)  Recompute  A: 


. / I f old  a. . iff  o. ,o.c0  i i 

Asia  I 3:.  = i ID  ID  } > 


mark  it  "nonexistent”  otherwise 


19 


9)  k = k - Ij 

If  k > |o[ , set  k = |o| ; 

Go  to  1. 

Once  this  procedure  is  executed  (stopping  points  are  possible  at  steps 
1,  6 and  7),  we  are  left  with  J kernel  subsets  KESU(l),  KESU (J) . 

Two  cases  are  possible: 

1)  a = 1. 

This  basically  tells  us  that  no  subset  of  nodes  stands  out  as  a clear 
candidate  kernel  subset,  i.e.,  tlie  graph  has  no  meaningful  structure.  For  example. 
Figures  4 and  5 depict  two  cases  where  this  situation  arises  (the  execution  of 
the  procedure  described  above  is  summarized  next  to  each  graph) . The  result  is 
intuitively  correct  in  both  cases,  particularly  for  the  graph  in  Fig.  5.  The 
value  assumed  for  the  a priori  parameter  k,  however,  determines  the  result  which 
is  achieved.  We  will  have  something  to  say  about  such  a value  at  the  end  of  this 
section;  an  intuitive  interpretation  of  the  behavior  generated  by  different  k 
values  is  presented  later  in  section  5.2.2. 

Fig.  5 also  illustrates  the  rationale  behind  the  stopping  rule  at  step 
1 of  the  procedure,  which  is  not  as  intuitive  as  the  ones  at  steps  6 and  7.  The 
rationale  is  as  follows:  If,  for  a given  graph  it  turns  out  that  c^  = c^  Vi,j, 
there  is  no  apparent  structure  in  it.  Note  that  c^  = c^  = (say)  a^  Vi  , j implies 
that  every  node  in  the  graph  has  a links  incident  to  it  (incidentally,  note  also 
that  this  means  that  there  are  (na/2)  links  in  the  graph,  so  that  this  circums- 
tance can  only  occur  if  na  = 2 — even — , which  is  indeed  the  case  in  Fig.  5):  thus, 
there  is  no  way  that  such  links  can  be  arranged  so  as  to  display  clearly  separable 


20  - 


I 

I 

I 

I 

i 


1 

i 


Assume  k = 4. 

!•-  c =c  =1;  c =c  =2. 

14  2 3 

CS2={o^,02,o^} 

CS3={o2, 03,04) 

4. -  KS.  = CS. , i=l,2,3,4. 

1 1 

5. -  (Arbitrarily)  KS  =KS  =CS 

P 1 1 

6. -  Iks  \ = Ics  l:Stop; 

P P 

J=l,  KESU(1)=  O 


Assume  )c  = 2. 

1. -  03^=02=03=04=3,-  Stop:  KESU(1)=0. 

If  we  didn't  stop  here,  we  would  proceed: 

2. -  (Arbitrarily)  pic)c  02^'°2‘ 

3. -  083=05 2={o3^, 02,03,041. 

4. -  KS3=KS2=0S3=0S2 

5. -  (Arbitrari.  ly) 

KS  = KS  = OS, 

p 1 1 

6. -  |KSpl  =|CS^1:  Stop; 

J=l,  KESU(1)=  O 


Fig.  4 


Fig.  5 


subgraphs.  An  extreme  oase  ocours  for  a = n-1  (as  in  Fig.  5);  it  implies  that  the 
graph  is  oompletely  oonneoted,  i.e.,  we  oan  interohange  the  labels  of  any  pair  of 
nodes  and  still  obtain  the  same  adjaoenoy  matrix,  whioh  obviously  suggests  that 
no  single  node  is  different  in  any  respect,  as  far  as  graph  struoture  is 
concerned,  from  any  other.  The  only  reasonable  alternative  in  this  case  is  to 
consider  the  entire  graph  to  be  a single  subgraph.  The  stopping  rule  at  step  1 
m£ikes  this  decision  early,  thus  saving  computations  (note,  for  instance,  that 


21  - 


if  we  do  not  stop  at  step  1 in  Fig.  5 we  still  reach  the  same  conclusion). 

2)  J > 1. 

In  this  case,  some  subsets  of  nodes  stand  out  as  "good"  kernel  subsets 

that  can  subsequently  be  completed  with  the  remaining  nodes  if  they  do  not  span 

the  complete  set  O.  An  example  is  presented  in  Fig.  6.  It  results  in  two  kernel 

subsets,  namely,  {o  , o , o } and  {o_,  o , o },  that  in  this  case  span  the 
1 J 4 2 5 6 

complete  set  O and  thus  represent  a partition  already.  This  partition  turns  out 
to  be  the  best  of  all  possible  partitions  of  that  graph  as  evaluated  by  the 
measure  M introduced  before. 

Now  we  turn  to  a discussion  about  k,  the  parameter  that  has  to  be 
specified  a priori  in  order  to  use  the  procedure  described  above.  Consider 
again  Fig.  4:  we  assumed  k = 4;  this  led  to  the  identification  of  a single 
subgraph,  the  entire  original  graph.  Had  we  taken  k = 2,  the  procedure  would 
have  produced  two  kernel  subsets  of  nodes:  {o^,  o^}  and  {o^,  that  already 

represent  a partition.  As  measured  by  M,  this  is  the  best  two  way  partition 
of  that  graph,  but  still  inferior  to  the  "no  partition"  solution.  However,  from 
a strict  graph  structure  standpoint,  the  two  way  partition  is  superior,  since 
It  is  apparent  that  the  subgraphs  {o^,  o^)  and  {o^,  o^}  are  structurally  identical 
and  thus  display  some  special  configuration  inherent  to  the  initial  graph  (see 
section  5.2.2  below  for  a more  intuitive  interpretation  of  this). 

What  can  then  be  said  about  possible  values  for  k?.An  obvious  upper 
bound  is  n,  the  total  number  of  nodes  in  the  initial  graph.  In  most  cases,  how- 
ever, choosing  k=n  will  result  in  a single  kernel  subset:  the  entire  graph.  The 
reason  for  this  is  that  if  we  take  into  consideration  the  "core  sets"  of  all  nodes. 


k 


- 22  - 


Assume  k = 2. 

; c^=C2=3 

3. -  CS^  = {0^,02,03.0^  J 

^^2  =l°l'°2'°5'°6i 

4. -  KS3  = i03,02-{ 

KS2  = {0i,02> 

5. -  (Arbitral!  ly)  KS^  = KS^ 

6. -  USpI  < |cSp|; 

KESU(l)  =-(o;l'°3'°41 

7. -0  = {o2,o^,o^] 


3.-  CS2  = {02,03,0 J 


KS  = CS., 

I 

KESU(2)  = (03^,03,0^]  ; stop 


I 


Pig.  6 


- 23  - 


chances  are  that  their  interferences  will  be  so  considerable  that  no  one  of  them 
will  stand  out  as  clearly  isolated;  the  procedure  will  tend  to  stop  at  step  6 
(i.e,  the  best  core  set  is  completely  interferred  by  all  the  others).  This  is 
illustrated  in  Fig.  5.  Thus,  )c  should  not  be  set  as  high  as  n or  any  close 

value. 

A lower  bo\ind  is  1,  but  in  general  this  is  not  an  appropriate  value 

either.  The  reason  is  that  it  can  lead  to  very  undesirable  results  because 

basically  it  is  equivalent  to  picking  a good  core  set  without  worrying  about 

whether  it  interferes  with  others  or  not.  For  example,  consider  the  graph  in 

Pig.  7.  Taking  k=l  results  in  the  kernel  subset  {o  , o , o , o , o },  likely 

Jl  ^ b o lx 

to  preclude  from  further  consideration  the  intuitive  partition  shown  in  the 
figure. 


Fig.  7 


Therefore,  )c  should  be  such  that  1 < k < n.  This  is  not  a very  strong 


24 


I 


i 


assertion.  Our  experience  indicates  that  k can  be  set  to  a value  somewhat  higher 
than  the  expected  number  of  subgraphs.  Furthermore,  varying  k has  the  following 
intuitive  effect:  the  greater  the  more  "conservative"  we  are,  in  the  sense 
that  less  subgraphs  will  be  identified  because  we  consider  interferences  among 
many  core  sets.  Experimenting  with  a few  k values  and  looking  for  stability  in 
the  results  is  usually  a good  strategy. 

A less  ad-hoc  strategy  for  choosing  k is  as  follows:  After  step  1,  order 
the  c^'s  (i=l,  ...,  n)  in  decreasing  order  (this  has  to  be  done  anyway).  Then, 
pick  the  node  with  maximum  c^  and  all  the  following  ones  that  have  a c^  higher 
than  a given  percentage  (<  100)  of  that  maximum.  This  percentage  figure  still  has 
to  be  specified  a priori , but  it  seems  a more  unified  measure  of  "how  risky"  we 
wish  to  be.  For  example,  picking  80%  loads  to  k=2  for  the  examples  in  Figs.  4 
and  6.  Any  percentage  figure  leads  to  k=4  for  Fig.  5,  which  is  consistent  with  the 
fact  that  the  graph  has  no  apparent  subgraphs  in  the  sense  we  are  interested  in 
here. 

5 . 1 . 1 . 1 - Comments  on  a possible  generalization  of  the  core  set  concep t_^ 

At  this  point,  a possibility  for  generalizing  the  core  set 
concept  comes  to  nind.  It  is  worth  investigating,  briefly,  what  happens 
if  we  genrallze  the  definition  of  core  set  to  read  as  follows: 

CSi*  s.t.  minimum  path  (o^-+Oj)  £ p;  p ^ 1}  , 

where  p = 1 would  produce  the  definition  given  before. 

Although  this  definition  seems  intuitively  sound,  a more  careful 
investigation  shows  that  it  makes  no  good  sense  in  general.  To  see  why, 
it  is  useful  to  view  the  concept  of  core  set  from  the  following  perspective: 


25  - 


The  construction  of  a core  set  in  a given  graph  is  equivalent  to  marking 
each  node  in  the  graph  in  a binary  fash.ion  (i.e.,  a specific  node  either 
belongs  to  tlie  core  set  of  it  doesn't).  In  this  sense,  two  nodes  are 
considered  equivalent  (with  respect  to  the  node  whose  core  set  is  being 
constructed,  which  can  be  called  the  "core  set  node",  CSN)  when  the 
minimum  paths  from  the  CSN  to  either  of  these  two  nodes  is  ^ p.  From 
a strict  graph  structure  viewpoint,  this  is  wrong,  unless  p = 1.  The 
reason  is  that  if  p > 1 the  two  paths  may  in  fact  (i)  differ  in  length  or 
(ii)  be  completely  different  even  if  their  lengths  are  equal.  Thus,  taking 
p > 1 in  the  definition  of  core  set  loses  information  because  it  leads  to 
summarize  in  a binary  output  a comparison  that  can  produce  more  than  two 
outcomes  in  general. 

Therefore,  if  we  wish  to  generalize  the  definition  of  core  set  to 
apply  for  p > 1 we  should  take  into  account  all  possible  comparison  outcomes 
and  make  two  nodes  equivalent  with  respect  to  a third  (the  CSN),  in  the 
sense  above,  only  when  these  two  coincide  exactly,  i.e.,  when: 

(a)  The  minimum  path  length  from  either  node  to  the  CSN  is  the 
same ; and 

(b)  The  two  minimum  paths  are  exactly  the  same,  except  for  the 
last  node. 

It  can  be  shown  in  general  that  these  two  conditions  imply  taking 
p = 1,  as  illustrated  below. 

Since  condition  (b)  implies  condition  (a),  all  we  need  to  prove 


is  the  following: 


26  - 


Proposition 

If  in  a given  graph  it  is  true  that: 


Minimum  path  (o.  >o.)  = Minimum  path  (o.-H),  ) (except  for  the 
1 J X k. 

last  node),  then  either 


1) 

2) 


o . is 
1 


are  both  adjacent  to  o 
adjacent  to  neither  o. 


£L 


nor  o 


k' 


(i.e.,  o.  and  o,  are  made  equivalent 

J k 

only  when  they  both  belong  to  the  CS 


by  assigning  them  to 
defined  with  p = 1). 


CS 


i 


Consider  Figure  8: 


Figure  8 


For 

dotted  circle 
path  (o^-^Oj^). 


the  proposition  condition  to  be  true,  the  path  inside  the 


If  we  assume  this  to  be  true,  only  two  cases  need  to  be 


considered.  Either: 


27  - 


or 


• The  circled  path  Is  empty,  which  Implies  (1)  above; 

• The  circled  path  Is  non-empty,  which  Implies  (2)  above 

because  If  either  o.  or  o (or  both)  were  adiacent  to  o . 

j k -'I’ 

the  circled  path  would  not  be  part  of  a path  of  minimum 
length.  Q.E.D. 


5.2.-  Putting  the  graph  decomposition  problem  in  the  cluster  analysis  framework. 

As  was  mentioned  earlier  in  section  5,  graph  decomposition  problems 
are  not  readily  approachable  by  means  of  cluster  analytic  techniques^  because  the 
adjacency  matrix  of  a graph  fails  to  meet  the  conditions  assumed  for  similarity 
matrices.  Since  many  cluster  analytic  techniques  already  exist,  resolving  this 
incompatibility  would  permit  the  use  of  this  available  knowledge. 

There  are  at  least  two  ways  of  doing  so,  as  described  below. 

5.2.1.-  ''Minimum  path"  dissimilarity  measures. 

One  way  is  to  derive  a dissimilarity  (distance)  matrix  from  the  graph 
adjacency  matrix  as  follows: 

Using  the  same  notation  as  in  section  5.1.2  and  calling  the  resulting 

distance  matrix  S(s. .),  let 
ID 

8, , = Number  of  links  in  the  minimum  path  from  o.  to  o.. 

Ij  ^ 1 j 

TTie  resulting  s^^'s  can  be  shown  to  be  metrics  (see  Appendix  I). 

For  example,  the  graph  in  Fig.  6 results  in  the  S matrix  shown  in  Fig.  9 . 


28  - 


BB 

SI 

D 

2 

3 

B 

5 

6 

1 

0 

n 

B 

B 

2 

2 

2 

D 

0 

2 

2 

B 

B 

3 

D 

2 

0 

B 

3 

3 

B 

D 

2 

B 

0 

3 

3 

5 

2 

D 

3 

3 

0 

B 

6 

2 

B 

3 

3 

B 

0 

Fig.  9 


-29- 


5.2.2.-  "Connectivity"  dissimilarity  measures. 

Another  way  of  deriving  a distance  matrix  for  the  nodes  in  a graph  is 
inspired  by  the  concept  of  "core  set"  introduceci  in  the  heuristic  approach  discussed 
in  section  5.1.2.  Entries  in  the  distance  matrix  can  be  computed  as  follows: 

[CS  r\  CS  [ 

s^.  = 1 

|CS.  u CS,  1 

where  CS^  is  the  core  set  of  node  o^  (see  5.1.2). 

It  can  be  shown  (see  Appendix  II)  that  the  s. ,'s  so  defined  meet  condi- 
tions  2,  3 and  4 of  section  3,  which  characterize  metric  distances.  Condition  1 
is  not  met  in  general  but  the  strategy  is  still  useful  as  we  discuss  below. 

Consider,  for  example,  the  graph  in  Fig.  6.  Its  associated  S matrix  is 
shown  in  Fig.  lO. 


N 

1 

2 

3 

4 

5 

6 

1 

0 

0.66 

0.25 

0.25 

0.83 

0.83 

2 

0.66 

0 

0.83 

0.83 

0.25 

0.25 

3 

0.25 

0.83 

0 

0 

1 

1 

4 

0.25 

0.83 

0 

0 

1 

1 

5 

0.83 

0.25 

1 

1 

0 

0 

6 

0.83 

0.25 

1 

1 

0 

- , J 

0 

Fig.  10 


-30- 


The  following  circxamstances  can  bo  noticed  in  Fig.  10: 

1)  The  rows  (or  columns)  corresponding  to  o.  and  o and  to  o_  and  o are 

3 4 bo 

exactly  identical. 

ii)  s,,=s^.=0,  which  violates  metric  property  1 (see  section  3). 

34  bb 

It  can  be  shown  in  general  that  these  circumstances,  if  they  arise, 
do  so  for  the  same  pairs  of  related  nodes,  i.e.: 

®ik  = ° ^"ij  = "kj  \ 


s.,  = 0 =^o.  and  o,  are  related  J 
ik  X k ■' 

7t)  show  this,  we  proceed  as  follows: 

IcSincs^l 

1)  s..  = 0 (by  definition  of  s.,)  = 1,  i.e.: 


IcS.  r\  CS.  I = IcS.  CS,  1,  which  is  only  possible  when 
*1  k ' ' 1 k ' 

CS,  = CS,  . 
i k 

This  implies  a.,  = 1,  i.e.,  nodes  o.  and  o,  are  related:  otherwise, 
ik  1 k 

o,  ft  CS.  while  — by  definition  of  CS, — o,  e CS,  , so  that  CS.  = CS,  would  not  hold. 


Thus,  s = 0 


{CS.  H CS, 

. * 

o.  and  o, 

X )i 


are  related. 


Now,  by  definition  of  s.  . and  s.,  , 

ID  xk 

|CS  n CS  I |CS  n CS . 

s..  = 1 i 3_  ; s . = 1 2. 

^2  jcs.  U CS.  1 2 |cs  U CS. 

i i ^ D 


But  if  CS,  = CS,  , then  s . . = s,  . V j , and  therefore 
i k XD  kD 


s.  . Vj 

kD 

cure  related 


■4 


31 


2)  If  we  assume  that 

Sfj  = Vj,  this  certainly  holds  for  j = k: 

®ik  ®kk* 

But  s,  , =0,  and  so  s.,  =0. 
kk  ik 

Itms, 


^ j = 0 (=s^^  because  S is  symmetric) 


ik  ' ki 

Equations  (b)  and  (c)  together  produce  equation  (a)  . Q.E.D. 


(c) 


These  are  very  nice  properties  of  the  resulting  matrix  S: 

If  it  turns  out  that  for  some  pair  of  nodes  o.  and  o,  , s.,  = 0 (i.e.,  CS 

1 k ik  1 

= CS  ),  then  it  is  true  that  s . . = s,  . Vj.  In  other  words,  nodes  o.  and  o,  are 
K ijkj  1 k_ 

equivalent  with  respect  to  the  rest  of  the  graph  as  described  by  the  matrix  S. 

As  far  as  we  consider  this  matrix  an  appropriate  distance  matrix  for  cluster 

analysis  purposes,  the  two  nodes  in  that  pair  can  be  collapsed  into  one.  This  is 

intuitively  appealing  since  (i)  s^^^  = 0 (as  s^^  = 0 Vi)  , and  (ii)  o^  and  o^^  are 

related;  in  fact,  it  suggests  that  these  nodes  should  be  put  in  the  same  subgraph 

in  an  eventual  partition. 

Thus,  whenever  we  come  across  a s^j^  = 0,  we  can  delete  one  of  the  nodes 
from  further  consideration,  and  assign  it  to  whatever  subgraph  the  other  one  ends 
up  in. 

In  the  case  of  the  graph  shown  in  Fig.  6,  doing  so  with  the  pairs 
(o^,  o^)  and  (o^,  o^) , and  deleting  o^  and  o^,  produces  the  distance  matrix  S 
shown  in  Fig.  11a,  corresponding  to  the  "collapsed"  graph  depicted  in  Fig.  11b. 


^ S*  ' 


32  - 


1 

dL 

3(4) 

5(6) 

1 

0 

0.66 

0.25 

0.83 

2 

0.66 

0 

0.83 

0.25 

— 

3(4) 

0.25 

0.83 

0 

1 

5(6) 

0.83 

0.25 

1 

0 

Fig.  11a 


Fig  .11b 


If  this  is  done  for  all  the  possible  pairs  of  nodes  whose  computed 
distance  is  zero,  the  S matrix  is  reduced  in  size,  its  entries  are  made  to  meet 
the  first  condition  for  them  to  be  metrics  (see  section  3) , and  preliminary 
clustering  is  performed. 

With  the  matrix  of  Fig.  11  a,  most  clustering  algorithms  would  cluster 

0-(o  ) with  o,  and  o^(o^)  with  o„,  the  best  partition  for  the  graph  of  Fig.  6. 

3 4 1 5 6 2 


From  a a/..c  general  standpoint,  it  is  interesting  to  observe  that,  in  a 
sense,  the  procedure  described  above  treats  the  nodes  of  a graph  as  described  by 
their  corresponding  rows  (columns)  of  the  associated  graph  adjacency  matrix;  in 
other  words,  it  considers  each  node  as  described  by  a set  of  n attributes:  its 
relationships  with  all  the  nodes  in  the  graph.  Consequently,  the  resulting 
distances  (entries  in  the  distance  matrix  S)  tell  us  something  about  the  overall 
structure  of  the  graph  that  was  not  clearly  apparent  in  the  original  data  (e.g., 
"collapsable"  nodes).  This  is  in  contrast  with  traditional  distance  matrices,  which 


k 


■4 


33  - 


do  not  display  such  a structure  as  directly.  In  this  sense,  the  results  obtained 
when  setting  k=2  in  the  procedure  of  section  5.1.2  as  applied  to  the  graph  in  Fig.  4 
(i.e.,  the  two  way  partition  reflect  more  accurately  the 

inherent  structure  of  that  graph.  As  it  turns  out,  the  strategy  proposed  above  re- 
sembles one  described  in  fCurry  76j  for  computing  distances  between  two  objects 
characterized  by  a set  of  attributes  measured  in  a binary  scale.  The  difference 
with  respect  to  our  approach  is  that  we  consider  the  attributes  to  be  the  relation- 
ships (i.e.,  similarities)  between  objects  (nodes)  that  characterize  the  actual 
set  (graph)  under  decomposition,  as  opposed  to  some  external  set  of  attributes. 

This  suggests  that  perhaps  a similar  strategy  could  be  employed  in  general,  to 
ma)ce  the  intrinsic  structure  of  the  initial  set  more  apparent  in  the  matrices 
used  in  cluster  analysis,  be  they  binary  or  not;  we  shall  explore  this  possibility 
in  sections  5.3.2  and  5.3.3  below. 

Finally,  note  that  the  resulting  distance  matrix  is  normalized:  all  its 
entries  assume  values  in  the  interval  [o,l]. 


<1 


34 


5 ■ 2 ■ 2 . 1 . An  alternative  approach  to  compute  distance  matrices  when  preclus- 
tering takes  place. 

The  strategy  proposed  in  the  preceding  section  for  computing 
dissimilarity  (distance)  matrices  is  basically  a two-step  process  as 
follows : 

(1;  Compute  distances  between  all  pairs  of  nodes. 

(2)  Check  whether  any  computed  distance  is  zero.  If  so,  delete 
the  row  and  column  corresponding  to  the  zero  entry  in  the 
distance  matrix. 


Note  that  since  distances  are  computed  before  collapsing  nodes, 
they  correspond  to  the  structure  of  the  original  graph;  collapsed  nodes  are 
taken  as  separate  nodes  for  the  purposes  of  distance  computation. 


It  can  be  shown  in  general,  however,  that  any  two  nodes  o^  and 
such  that  = 0 either  appear  both  or  do  not  appear  at  all  in  a given  core 

set  CS  : 


P 


1 


By  the  definition  of  core  set,  it  follows  that 


I 

I 


A 


35  - 


because  o.  e CS,  =>  a . . = 1 = a ^ ^ o C CS  and  vice  versa, 

i K IK  Kl  K 1 

So,  since  s.,  = 0 CS . = CS  , it  follows  that: 
ik  1 k 

"ik  = 

Pj  ^ ^ °p  ^ *=^i  ^ °p  ^ ^^k  ^ °k  ^ 

b)  O.  ^ CSp  ^ Op  0 CS.  Op  0 CSj^  =>  ^ CSp  , q.e.d. 


This  suggests  that  entries  in  the  distance  matrix  can  be  alterna- 
tively computed  by  considering  collapsed  nodes  as  single  nodes,  i.e.,  by 
basically  reversing  the  order  ot  steps  (1)  and  (2)  above.  The  resulting 
entries  in  the  distance  matrix  will  still  be  metrics,  and  they  won't 
coincide,  in  general,  with  entries  computed  as  described  above. 

A simple  example  will  make  tliis  point  clearer:  Consider  tlie 

graph  in  Figure  6 (page  22).  The  distance  matrix  computed  as  proposed  in 
the  preceding  section  (after  collapsing  nodes)  is  that  in  Figure  11a. 

The  alternative  distance  matrix  computation  would  proceed  as 

follows: 


Upon  recognition  that  nodes  o^  and  o^ , and  o^  and  o^  collapse, 
the  grapli  ir  Fig.  2 is  in  fact  reduced  to  that  shown  in  Fig.  lib. 

The  core  sets  for  this  graph  are  as  follows: 


CS^: 

CS^: 

{Oj^j02*o^Co^)  } 

^^3(4) 

c 

0 

0 

^^5(6) 

I 

I 

•» 


. 


36  - 


so  that  the  resulting  distance  matrix  would  be  that  in  Fig.  12. 


1 

2 

3(4) 

5(6) 

1 

0.00 

1 

0.50 

0.33 

0.75 

2 

0.50 

0.00 

0.75 

0.33 

3(4) 

0.33 

0.75 

0.00 

1.00 

5(6) 

0.75 

0.33 

1.00 

0.00 

Figure  12 

The  question  thus  arises:  Which  of  the  two  strategies  should  be 

employed  to  compute  distance  matrices  for  a graph? 

First  of  all,  note  that  the  two  strategies  trivially  coincide 
^ when  no  collapsing  of  nodes  occurs.  On  the  other  hand,  it  seems  intuitively 

I appealing  to  use  the  first  strategy  when  collapsing  does  occur,  since  the 

« 

( distances  so  computed  reflect  more  accurately  the  structure  of  the  original 

graph.  For  example,  compare  the  graphs  in  Figs.  11b  and  A.  Using  the 
second  strategy  would  produce  the  same  distance  matrix  for  these  graphs. 

This  is  intuitively  wrong,  since  we  know  that  in  Fig.  11b,  node  (o^) 

; corresponds  to  two  original  nodes.  The  fact  that  d^^  4)  Fig.  11a 

I . is  less  than  d^^  in  Fig.  12  confirms  this  intuition:  the  two  collapsed 

nodes  o^  and  o^,  both  close  to  o^^  in  the  original  graph  (Fig.  6)  in  fact 
"pull"  o^  closer  to  the  node  resulting  from  collapsing  o^  and  o^  than  it 
j would  otherwise  be,  in  a graph  such  as  that  of  Fig.  A. 

i 

¥ 

f 


37  - 


It 


For  these  reasons, 
to  compute  distance  matrices 


we  will  predominantly  use  the  first  strategy 
for  graphs  in  which  collapsing  occurs. 


I’ 


5.3.-  Graph  decomposition  techniques  applied  to  cluster  analysis. 


Section  5.2  discussed  some  strategies  for  constructing  a metric  distance 
matrix  from  a graph  adjacency  matrix,  so  that  the  graph  decomposition  problem  can 
be  treated  in  a cluster  analytic  framework.  In  this  section  we  explore  the 
opposite  move,  i.e.,  is  it  possible  to  represent  a cluster  analysis  problem  as 
a graph  decomposition  one?  The  motivation  for  this  is  that  if  such  a transformation 
were  possible,  some  of  the  techniques  suitable  for  graph  decomposition  problems 
like  the  heuristics  described  in  section  5.1.2  could  be  employed  in  cluster 
analytic  algorithms  and  their  effectiveness  investigated. 

In  general,  and  from  a strict  point  of  view,  the  answer  to  the  above 
question  is  no.  Of  course,  some  distance  matrices  can  be  made  to  correspond 
to  a graph  adjacency  matrix  (e.g.,  we  know  that  the  matrix  in  Fig.  5 corresponds 
to  the  graph  in  Fig.  6),  but  there  is  no  straight  forward  algorithm  that  we  know 
of  which  would  permit  us  even  to  assess  whether  or  not  a given  distance  matrix 
corresponds  to  some  graph,  in  general,  let  alone  what  that  graph  would  be. 
Nevertheless,  we  still  feel  that  some  insight  can  be  gained  from  exploring  this 
move.  The  next  section  proposes  a way  of  applying  the  ideas  behind  the  heuristics 
of  section  5.1.2  to  cluster  analysis  problems.  Prior  to  that  discussion,  however, 
we  point  out  a crude  method  for  putting  cluster  analysis  problems  in  graph  form. 
This  approach,  which  was  proposed  in  ^Hubert  74],  works  as  follows: 

Given  an  nxn  distance  matrix  S(s. choose  a threshold  value  T > 0 and 

ij 


define 


1 


-39- 


We  can  then  think  of  the  matrix  S'(s^j)  as  the  adjacency  matrix  of  a 
graph  in  which  pairs  of  related  nodes  correspond  to  objects  in  the  original 
problem  whose  distance  is  less  than  T.  Of  course,  this  transformation  losses 
a great  deal  of  information,  because  all  the  resulting  s^^'s  are  either  0 or  1, 
while  the  original  s.^'s,  in  general,  would  vary  more  smoothly  in  a wider 
interval;  furthermore,  different  T values  can  produce  different  graphs.  However, 
this  strategy  can  be  appropriate  when  the  original  entries  in  S take  values 
that  are  either  very  high  or  very  low,  by  choosing  a T value  in  between.  The 
resulting  graph  can  be  drawn  and  maybe  even  visual  insight  employed  to  identify 
an  initial  partition. 

A more  sophisticated  approach  could  proceed  as  follows: 

1)  Choose  several  T values  T, , T_,  ...,  T . 

12  m 

For  each  T value, 

2)  Derive  a graph  as  described  above. 

3)  Using  one  of  the  transformations  in  5.2.1  or  5.2.2,  construct  a distance 
matrix  for  the  graphs  generated  in  2. 

4)  Conduct  a statistical  comparison  test  between  the  distance  matrices 
computed  in  3 and  the  original  one. 

5)  Retain  th"  T value  that  produced  the  graph  which  resulted  in  the  best 
lit  in  4. 

Tills,  however,  would  probably  be  very  time  consuming  and  not  worth  the 
effort  if  the  output  was.  to  be  only  an  initial  partition.  A more  direct  approach 
that  takes  advantage  of  the  heuristics  discussed  in  section  5.1.2  is  proposed  in  the 


next  section. 


-AO- 


5.3.1  - A strategy  for  constructing  initial  partitions  in  non-agglomcrative 
dust cr  analysis . 


As  was  ixsinted  out  in  section  4.2,  no  particularly  "good"  methods  to 
identify  initial  partitions  for  non-agglomerative  cluster  analysis  techniques 
are  available.  We  think  that  a potentially  appropriate  method  can  result 
from  the  application  of  a strategy  similar  to  that  employed  to  construct 
kernel  subsets  in  the  graph  case  (see  5.1.2). 

We  describe  one  such  strategy  below.  Its  parallelism  with  the 
procedure  in  5.1.2  should  be  apparent. 

Given  an  nxn  distance  matrix  S(s. .)  defined  over  a set  of  n objects 
O:  {o  ,o  , . . . , o },  this  procedure  would  proceed  as  follows: 


1) 

2) 

3) 

4) 


Pick  a thresliold  value  T,  0 < T < max  C S)  . 

Set  J = 0. 

For  each  o^  C O,  compute: 

CS . = {o  i I d . E O and  s . . < t}  . 

1 .)  .)  i]  — 

c.  = Ics. I - 1. 

Consider  the  k > 1 objects  with  highest  c^.  Without  loss  of  generality, 
assume  that  these  objects  are  . . . , Oj^. 


From  this  point  on,  this  procedure  is  very  similar  to  one  proposed 
in  [Astrahan  70]  for  identifying  leader  objects  (see  4.2).  Since  Astrahan's 
technique  is  considered  to  be  one  of  the  most  intuitively  sound  for  these 
purposes  (see  (Anderberg  73]),  we  believe  that  the  one  described  above  would 
also  perform  well;  furthermore,  it  requires  fewer  a priori  parameters  than 
Astrahan's  (only  two,  T and  k,  as  compared  with  three  requires  by  Astrahan's 


method) . 


-41- 


♦ 

5.3.2  - Working  with  the  similarity  matrix  as  a whole  prior  to  applying 
clustering  algorithms;  norm,Tl ization  of  distance  matrices. 

In  this  section  we  propose  a strategy  for  transforming  a given 
similarity  (or  dissimilarity)  matrix  into  a new,  normalized  one,  by  means  of 
a procedure  similar  to  that  described  in  section  5.2.2  for  transforming 
adjacency  matrices  into  distance  matrices. 

For  exposition  purposes,  it  is  useful  to  describe  the  computations 
discussed  in  5.2.2  in  a more  generalizeible  way. 

Recall  (section  5.2.2)  that  the  entries  in  the  resulting  distance 
matrix  were  computed  as 


where  CS.  stands  for  the  "core  set"  of  node  o.  as  defined  in  5.1.2.  New 
1 1 


expressions  for  |CS^  0 CS ^ | and  |CS^  U CS^ | can  be  derived  as  follows: 


For  each  o^  C O (the  original  set  of  nodes) , define  a vector 

V.  = (a.,,a._,  . . . , a.  ),  i.e.,  the  row  of  the  adjacency  matrix 

1 11  i2  in 

corresponding  to  node  o^;  recall  that  a^^  = 1 V i,  here. 

It  is  clear  now  that 

IcS,  f*  cs.l  = V,  • v.,  the  inner  product  of  vectors  v.  and  v.; 

' i 3*  i j 1 3 

recall  that  their  components  are  either  zero  or  one. 

Furthermore,  since 

|CS^  U CS^I  = |CS^|  + |cSj|  - |CS^  CS^[  , and 


-42- 


ICS. I = V.  • V. , it  follows  that 
' 1 ' 1 1 

IcS.  U CS . I = V.  • V.  + V.  • V.  “ V.  • V. 

'i  113313 

Thus,  the  expression  for  s^^  can  now  be  written 


s.  . = 1 
11 


V.  • V , 


V. *V . + V . -V . - V. *v . 
11  11  11 


Note  that: 


1)  When  V.  =v.,  s,.  =0; 

1 1 11 


2)  When  v.*v.  = 0 (i.e.,  v.  and  v.  are  orthogonal),  s..  = 1. 

11  1 1 ' 13 


With  this  formalization,  by  analogy,  a given  similarity  matrix  S 

can  be  transformed  into  a normalized  distance  matrix  D(d. .)  as  follows: 

11 

- Define  n vectors  s . , i = 1 , . . . , n,  as  follows: 


h*  ^^il'^i2'  ' 


' ^n’- 


- Compute  D's  entries  as: 


d.  . = 1 - 

11 


s.  . *s  . . 

1*  3* 


s..*s..  + s..*  s,, 

1*  1*  3*  3< 


1*  1' 


Unfortunately,  the  distances  d^^  computed  this  way  cannot  be  shown 
to  be  metrics,  due  to  the  fact  that  the  components  of  vectors  v..  are  not 
binary  as  they  were  with  v^.  (In  general,  they  do  not  meet  condition  (4)  of 
section  3).  An  adjustment  can  be  made,  however,  to  overcome  this  problem.  If 
we  define  the  entries  in  a distance  matrix  D*(d?^)  as: 


d = 1 

ij 


8. *®S . 

i*  3* 


+ S.^SS_  - S..0S., 
i*  j*  j*  J*  1*  Jt 


, where 


■J 


-43- 


it  can  be  shown  that  the  entries  in  D are  metrics  (see  Appendix  III) . 

Actually,  assuming  that  s^^  ^0  Vi, j,  as  it  would  be  in  most 

similarity  matrices,  the  above  definition  can  be  changed  to 

n 

s ©s..  = I.  min(s.,  ,s.,  ) , 

3*  ik  jk 

and  the  entries  in  D*  would  still  be  metrics. 

Itiis  adjustment  intuitively  follows  the  strategy  used  in  the  graph 

case:  Given  two  vectors  v.  and  v.,  we  compute  v.  • v.  as 

13  ^13 

n 

v.'v.  = Is.  s.  , 

^ 3 iP  DP 

but  since  s^^  C {0,l}  Vi,p,  what  we  really  do  is  the  following:  If  both  nodes 

o.  and  o.  are  related  to  o , we  sot  s.  s.  to  1,  but  to  0 if  either  one  (or  both) 

1 3 p ip  3P  ' 

are  not  related  to  o . In  the  new  setting,  the  definition  of  s....®s..^  tries  to 

p ^ 1*  3* 

do  the  same:  since  ^^e  s. ,'s  are  similarity  measures,  when  we  come  across 

13 

a pair  of  objects  (0^,0 J not  equally  similar  to  a third  o^,  we  are  "conservative" 
and  use  only  the  smaller  similarity  measure. 

Ttie  similarity  matrix  D*  computed  as  above  can  be  used  in  any 
cluster  analysis  algorithm. 

The  following  observations  need  to  be  made  at  this  point: 


a)  Most  similarity  matrices  constructed  for  cluster  analysis 
purposes  pay  no  attention  to  their  main  diagonal  entries,  since 'they  are  not 
used  in  cluster  analysis  algorithms.  The  normalization  procedure  described 
above  requires  that  these  entries  bo  consistent  with  the  remaining  ones, 
because  they  are  used  in  the  normalization  process.  More  concretely,  in  a 
given  similarity  matrix  S(s^J,  it  should  be  true  that  s^^  = s^^  V i,j 


'4 


-44- 


and  that  ^ max  e S,  with  this  provision,  if  any  computed 

as  indicated  above  turns  out  to  be  zero,  objects  o^  and  o.  can  be  collapsed 

into  a new  one,  (at,d  thus  clustered  together) , as  was  done  in  the  graph  case. 

If  the”?  conditions  are  not  mot  by  the  entries  in  S,  the  normalization  procedure 

can  produce  misleading  results:  For  exanple,  assume  that  s. . = s..  = s..  = 

i:  11  i: 

min  {s. . C s},  and  that  s. . = s..;  this  would  lead  to  d*.  = 0 and  thus  to  one 
13  1*  3*  13 

cluster  containing  o.  and  o.,  but  o.  and  o.  are  the  most  dissimilar  pair  of 

1311 

objects  in  O,  since  s.  . = min  {s.  . C s}  I 

13  ID 

b)  "Collapsable"  objects  can  still  result.  Since  such  objects  will 

end  up  in  the  same  cluster,  the  normalization  procedure  also  gives  more 

"apparent  structure"  to  the  original  data. 

c)  The  discussion  above  assumed  that  S was  a similarity  matrix.  If  it 
was  a dissimilarity  one,  we  can  still  apply  the  procedure  by  first  defining 

B = max  then  computing  S'(sV^)  by  letting  = B - ^ , and  working 

with  S'.  Note  that  for  S'  to  meet  the  conditions  set  forth  in  (a)  cibove,  S 

should  be  such  that  s^^  = s^^  V i,j  and  s^^  £ min  G S,  i^j}  (i.e., 

a given  object  shouldn't  be  more  dissimilar  — less  similar  — to  itself  than 
to  cuiother  object)  . 

d)  By  letting 


d.  . = 1 - d.  . , 

i3  13' 

the  resulting  matrix  would  be  a normalized  similarity  matrix.  So,  the  procedure 

described  above  can  produce  a normalized  { , } matrix  out  of  a 

dissimilarity 

/Similarity  1 
I one. 
dissimilarity 


-45- 


5 . 3. 2 . 1 ■ -I t erative  comp utation  of  distance  matricos. 


The  nature  of  the 

rdlstance  ■,  . . 

1 . . ; matrices  from 

similarity 

precludes  us  from  using  It 
distance  matrix,  then  using 
This  possibility  brings  up 


strategy  proposed  above  for  computing  normalized 

rdlstance  , . . , , 

i . . ; matrices  is  such  that  nothing 

similarity 

iteratively,  i.e.:  computing  a normalized 

it  as  input  to  compute  a second  one,  and  so  on. 
interesting  questions.  For  example: 


' Does  the  iterative  application  of  the  procedure  eventually 
converge  to  a "stable"  matrix? 


• If  collapsable  objects  can  result  at  each  iteration,  would 
this  iterative  procedure  behave  as  a hierarchical  clustering 
algorithm? 

These  questions  are,  at  least  in  principle,  more  of  a theoretical 
Interest  than  of  a practical  one,  given  that  such  an  iterative  procedure 
is  likely  to  be  slow  (at  each  iteration  an  entire  new  matrix  is  computed). 
Nevertheless  they  are  relevant  in  order  to  gain  insight  about  how  well  the 
distance  matrices  computed  as  proposed  convey  the  overall  structure  of  the 
set  of  objects  under  analysis. 

Although  we  haven't  approached  the  above  questions  from  an  analytical 
viewpoint,  we  have  conducted  some  experiments  which  seem  to  indicate  that 
the  answer  to  both  questions  is  yes. 

We  illustrate  this  by  means  of  a few  examples  below. 

Prior  to  introducing  these  examples,  the  following  observation 
needs  to  be  made: 


<1 


-46- 


• The  nature  of  metric  distance  matrices  is  such  that  it  precludes 
the  result  d^^  = 0 with  i ^ j from  occurring  (recall  that  one  metric 
property  precisely  states  that  d_ =0  <=>  i = j).  Therefore,  wc  would  never 
be  capable  of  collapsing  objects  in  the  iterative  procedure  that  we  are 
about  to  explore.  However,  it  turns  out  that,  as  the  Iterations  proceed, 
there  are  distances  (entries  in  the  computed  distance  matrix)  which  keep 
decreasing  and  taking  values  more  and  more  close  to  zero.  For  our  purposes 
here,  we  allow  the  specification  of  a parameter  C,  close  to  zero,  to  be 
used  in  conjunction  with  the  iterative  procedure  as  follows:  whenever  a 

computed  distance  d^^  is  less  than  £,  objects  o^  and  o^  are  clustered  together. 


Some  F.xamplcs 


Example  1 

The  first  example  was  chosen  to  be  trivial  from  a decomposition 
viewpoint;  its  main  thrust  is  to  show  that  the  iterative  procedure  behaves 
as  expected  in  the  simplest  case  imaginable.  Consider  the  graph  in  Fig.  13; 


4 5 8 

I 1 

3 6 7 

Figure  13 

It  is  not  completely  connected  and  consists  of  two  disjoint  subgraphs.  We 
would  expect  a decomposition  procedure  to  result  in  the  partition  {l,2,3,4}, 
{5, 6, 7, 8}.  Bel  ow  wc  show  that  the  Iterative  procedure  does  indeed  obtain  this 


-47- 


partition;  in  addition,  there  arc  other  interesting;  observations  to  be  made. 

The  adjacency  matrix  corresponding  to  the  graph  in  Fig.  13  is 
shown  below.  Note  that  this  matrix  can  be  viewed  as  its  initial  simil arity 
matrix;  it  is  interesting  to  compare  it  with  the  current  similarity  matrix 
after  a number  of  iterations. 


1 

2 

3 

4 

5 

6 

7 

8 

1 

1 

1 

0 

1 

0 

0 

0 

0 

2 

1 

1 

1 

0 

0 

0 

0 

0 

3 

0 

1 

1 

1 

0 

0 

0 

0 

4 

1 

0 

1 

1 

0 

0 

0 

0 

5 

0 

0 

0 

0 

1 

1 

0 

1 

6 

0 

0 

0 

0 

1 

1 

1 

0 

7 

0 

0 

0 

0 

0 

1 

1 

1 

8 

0 

0 

0 

0 

1 

0 

1 

1 

After  four  iterations,  the  similarity  matrix  looks  as  follows: 


1 

2 

3 

4 

5 

6 

7 

8 

B 

1.00 

0.94 

0.94 

0.94 

0.00 

0.00 

0.00 

0.00 

9 

0.94 

1.00 

0.94 

0.94 

0.00 

0.00 

0.00 

0.00 

B 

0.94 

0.94 

1.00 

0.94 

0.00 

0.00 

0.00 

0.00 

B 

0.94 

0.94 

0.94 

1.00 

0.00 

0.00 

0.00 

0.00 

5 

0.00 

0.00 

0.00 

0.00 

1.00 

0.94 

0.94 

0.94 

6 

0.00 

0.00 

0.00 

0.00 

0.94 

1.00 

0.94 

0.94 

7 

0.00 

0.00 

0.00 

0.00 

0.94 

0.94 

1.00 

0.94 

8 

0.00 

0.00 

0.00 

0.00 

0.94 

0.94 

0.94 

1.00 

After  four  more  iterations,  it  appears  like: 


i 


A8  - 


1 

2 

3 

4 

5 

6 

7 

8 

1 

1.00 

1.00 

1.00 

1.00 

0.00 

0.00 

0.00 

0.00 

2 

1.00 

1.00 

1.00 

1.00 

0.00 

0.00 

0.00 

0.00 

3 

1.00 

1.00 

1.00 

1.00 

0.00 

0.00 

0.00 

0.00 

4 

1.00 

1.00 

1.00 

1.00 

0.00 

0.00 

0.00 

0.00 

6 

0.00 

0.00 

0.00 

0.00 

1.00 

1.00 

1.00 

1.00 

6 

0.00 

0.00 

0.00 

0.00 

1.00 

1.00 

1.00 

1.00 

7 

0.00 

0.00 

0.00 

0.00 

1.00 

1.00 

1.00 

1.00 

8 

0.00 

0.00 

0.00 

0.00 

1.00 

1.00 

1.00 

1.00 

where  entry  values  have  been  rounded  to  the  nearest  two  decimal  numbers. 

If  we  compare  this  matrix  with  the  original  similarity  (adjacency)  matrix, 
it  is  obvious  that  the  graph  structure  has  been  made  much  more  apparent. 

Taking  E = 0.001,  the  expected  partition  is  obtained  after  three 
more  iterations.  Furtheimore,  the  matrix  becomes  completely  stable.  After 
this  iteration,  the  cuirent  distance  matrix  has  only  two  rows  and  two 
columns,  corresponding  to  the  two  subgraphs.  It  looks  like: 


1 2 

1 

2 

0.00  1.00 

1.00  0.00 

1 

It  is  interesting  to  note  that  the  entries  other  than  the  main 


diagonal  entries  have  the  maximum  possible  value  (1,  recall  that  distance 
matrices  are  normalized),  indicating  that  the  two  subgroups  identified  are 


A9  - 


completely  imlepcndent  for  our  purposes;  this  is  intuitively  correct  given 
that  tliey  were  disjoint  from  ttie  start. 


Example  2 

The  next  example  takes  a less  trivial  case.  We  use  the  graph  in 
Fig.  6,  reproduced  in  Fig.  14. 


Taking  an  C value  of  0.001,  the  distance  matrix  becomes  stable 
after  20  iterations.  Two  subgraphs  are  identified,  {1,3,4}  and  {2,5,6}  . 
The  final  distance  matrix  is  shown  below: 


Note  that  in  contrast  with  the  preceding  example,  entries  not  In  the  main 
diagonal  take  values  below  1.,  indicating  that  the  two  subgroups  were  not 
originally  disjoint. 


a 


■4 


-50- 


Examplo  3 

We  saw  in  the  previous  example  that  when  the  identified  subgraphs 
are  not  originally  disjoint,  the  final  distance  matrix  reflects  this  circum- 
stance by  setting  non-main  diagonal  entries  to  values  less  than  1.  A natural 
question  to  ask  is  whether  tills  intuitive  behavior  continues  when  the  coupling 
between  identified  subgraphs  varies  in  Importance.  This  example  and  the 
next  explore  this  question. 

The  graph  in  Fig.  15  differs  from  that  in  Fig.  14  in  the  number 
of  links  between  the  two  eventual  subgraphs. 


\ 

I 

1 

I 


Figure  15 


With  an  e value  of  0.001  as  before,  the  iterative  procedure  produces  the 
same  partition  {1,3,4},  {2,5,6}  after  16  iterations,  but  the  final  distance 
matrix  is  as  follows: 


1 

2 

1 

0.00 

0.13 

2 

0.13 

0.00 

Note  that  compared  with  that  in  Example  2,  it  behaves  as  expected. 


-52- 


r 

Example  4 

The  graph  In  Fig.  16  represents  another  step  towards  more  coupling 
between  subgraphs. 


Figure  16 

After  18  iterations,  the  same  partition  is  identified  but  the  final  distance 
matrix  indicates  that  the  two  subgraphs  are  closer  together: 


1 

2 

1 

0.00 

0.08 

2 

0.08 

0.00 

* * * 


-53- 


I 


The  preceding  examples  have  shown  that  the  proposed  iterative 
procedure  works  in  an  intuitively  appealing  way.  They  raise  some 
questions,  however.  For  example: 

(a)  What  about  t values? 

(b)  Does  the  procedure  always  end  up  with  two  subgraphs 
(subsets) ? 

We  have  explored  these  questions  working  with  several  additional  examples. 
Our  results  are  summarized  below: 


(a)  The  cliosen  e value  doesn't  seem  to  matter  much  as  long  as  it  is 
teasonably  close  to  zero.  For  instance,  all  the  examples  above  reach  the 
same  final  result  with  C = 0.01,  0.001  and  0.0001.  Of  course,  the  smaller  e, 
the  larger  the  number  of  iterations  needed  to  achieve  stability  in  the  distance 
matrix.  Moreover,  as  c increases,  the  intermediate  clustering  may  vary 
although  the  final  result  remains  the  same.  If  c is  increased  substantially, 
the  final  result  may  change  significantly.  For  example,  taking  e = 0.09 

in  example  4 above  (see  the  final  distance  matrix  in  which  the  largest  entry 
is  0.08)  would  result  in  clustering  the  entire  graph  together. 

(b)  The  procedure  doesn't  always  end  up  with  two  subgraphs,  although  this 
is  the  most  common  result.  Exceptions  have  been  experienced  when: 

1.  There  are  more  than  two  disjoint  subgraphs  initially;  as  many 
subgraphs  are  identified  whatever  the  ^ value  employed. 


( 

1 

t 

$ 

i 


54 


f 


I 


I 


2.  The  graph  is  so  compact  that  it  is  not  partitioned  at  all. 
This  may  depend  upon  C-  as  suggested  above.  However,  our 
experience  indicates  that  if  no  partition  is  identified 
with  a reasonably  small  e,  the  Initial  graph  or  set  is  Indeed 
too  compact  to  bo  partitioned  in  any  way. 


The  fact  that  more  than  two  subgraphs  (subsets)  are  not  identified 
unless  disjoint  subgrapiis  exist  at  the  beginning  is  somewliat  disturbing. 

We  have  observed  the  following  behavior,  however:  Whenever  the  result  is 

in  the  form  of  two  subgrapiis,  either  one  of  them  or  both  are  such  that  taken 
independently  are  not  partitioned  by  the  procedure.  This  suggests  that  the 
following  strategy  can  be  used  to  partition  a given  grapli  using  this  iterative 
approach : 

1)  Set  "current  graph"  to  be  the  initial  graph; 

2)  Use  the  Iterative  technique  to  partition  it; 

3)  If  the  outcome  is  no  partition,  save  the  "current 
graph"  as  a final  subgraph.  Go  to  5; 

4)  Otherwise,  put  the  two  (or  more)  resulting  subgraphs  in  a 
"subgraph  pool". 

5)  Check  the  "subgraph  pool".  If  it  is  empty,  stop:  The  subgraphs 

saved  at  step  (3)  represent  a partition.  If  not,  take  one 
subgraph  from  the  pool.  Make  it  the  "current  graph".  Go  to  2. 


We  have  used  this  strategy  with  a number  of  graphs;  the  results  have  been 
encouraging. 

In  summary,  two  main  comments  can  be  made  about  this  approach: 


55  - 


It  requires  a single  parameter:  c . 

It  seems  to  be  very  robust  with  respect  to  different  "reasonable" 
E values. 

The  final  result  gives  a "feel"  for  how  Interrelated  the 
identified  subsets  are,  through  the  final  distance  matrix. 


-56- 


5" 3.3.  - Tho  strategy  of  section  5.3.1  revised. 


Now  that  we  have  developed  a generali2ation  of  the  normalization 
procedure  originally  introduced  for  the  binary  matrix  case,  we  are  in  a position 
where  the  strategy  of  section  5.3.1  for  obtaining  initial  partitions  in  a non- 
agglomorative  cluster  analysis  algorithm  can  be  revised  in  a very  appealing 
way.  The  revision  permits  us  to  eliminate  the  need  for  the  threshold  parameter 
T,  whose  choice  was  not  obvious;  also,  by  using  a percentage  parameter  p of  the 
type  suggested  at  the  end  of  section  5.1.2,  we  eliminate  the  need  for 

If  we  keep  in  mind  that  one  possible  generalization  of  the  expression 

S..-S.  (in  the  terminology  of  the  preceding  section)  is 

i*  3 * 


the  following  procedure  can  be  used  (note  the  parallelism  with  that  in 

section  5.1.2),  where  we  assume  S(s. .)  to  be  a similarity  r.atrix  (see  Appendix  IV 

13  

for  a way  of  converting  dissimilarity  matrices  into  similarity  ones  so  that 
the  procedure  below  is  generally  applicable) : 

Given  £,  a percentage  parameter  , 


0) 

J = 0. 

n 

1) 

Confute 

c.  = E ! 

q/i 

Ics^l  = 

c.  + s . . 
1 11 

2) 

Compute 

o 

O 

II 

n 

Select  all  objects  o.  such  that 

3 


-57- 


- 100  *^0  ' 


Assume  tliere  are  k such  nodes;  without  loss  of  generality,  let 
them  be  the  nodes  o^,  . . . » 

If  k = n,  set  J = J+1,  KESU(J)  = O and  stop. 


3)  For  all  o^  C , Oj^}  compute 


Iks.  1 = E min(s.  , max  (s.  )) 
^ q=l  j = l 


4)  Select  o such  that 
r 


Iks  I = min  (Iks . h 

j=i 


5)  J = J + 1, 


If  |KS^|  = |CS^|,  set  KESU(J)  = 0 and  stop; 
else,  set  KESU(J)  = \j  t\ , where 


TT={o|o  s.t.  s > max  (s . ),  t=l,  . 
t'  t rt  3t 


. , n} 


6)  Recompute  O:  0=0-  KESU(J);  if  |o  | = 0,  stop. 


7)  Recompute  S: 
S:  <f  s . . s , . 

I ID  ID 

Go  to  1. 


old  s. . iff  o. ,o.  e O 

ID  1 D U 

mark  it  "nonexistent"  otherwise] J 


Two  examples  of  this  procedure  are  illustrated  below.  They  refer 
to  the  Scime  problem  but  use  different  matrices.  The  first  step  in  both  is 


>1 


-58- 


I 

I 


y 


to  transform  a distance  matrix  into  a similarity  matrix  as  indicated  in 
Appendix  IV.  Althougli  this  is  not  really  necessary,  since  we  can  always  change 
"max"  to  "min",  to  and  vice  versa  in  the  above  procedure  for  it  to  be 

appliceible  to  dissimilarity  matrices,  we  do  it  for  clarity.  A parameter  p = 80 
is  assumed  in  both. 


A)  The  matrix  in  Fig.  8,  corresponding  to  the  "minimum  path"  metric 
distances  of  the  graph  in  Fig.  6 is  transformed  to  the  similarity  matrix  shown 
in  Fig.  17  using  the  strategy  of  Appendix  IV. 


D 

2 

3 

B 

5 

6 

n 

3 

2 

2 

2 

B 

B 

2 

2 

3 

n 

B 

2 

2 

3 

2 

a 

3 

2 

0 

0 

n 

2 

a 

2 

3 

0 

0 

5 

D 

2 

0 

0 

3 

2 

6 

B 

2 

0 

0 

2 

3 

Fig.  17 


The  procedure  then  proceeds  as  follows: 


0)  J = 0 


1)  = 8,  C3  = = C5  = = 5. 


|csj  = IcsJ  = 11,  ICS3I  = IcsJ  = Ics^l  = Ics^l  = 8. 


A 


1 


-59- 


2)  Cq  = max  (8,5)  = 9; 

since  0.8*8  = 6.4  > 5,  k=2,  and  are  selected. 

3)  |kSj|  = Iks^I  = 8 { < 11) 

4)  (arbitrarily) 

5)  J = 1, 

IksJ  < IcsJ. 

KESU(l)  = {o, ,o,,o  ) . 

13  4 

6)  0 = {o^.o^.Og}  . 

7)  Delete  rows  and  columns  correponding  to  o^^jO^/O^. 


o..,o,.,0-  are  selected. 

2 5 6 

Since  )c  = n here,  J = 2 and 


KESU(2)  = {o^,o^,o^)  , 

and  the  procedure  terminates. 

Thus,  the  graph  in  Fig.  6 is  partitioned  in  the  best  possible  way. 

B)  For  the  same  graph  in  Fig.  6,  we  can  use  the  noramlized  matrix  in 
Pig.  10,  which  was  derived  in  a different  way.  Transforming  it  into  a 
similarity  matrix  as  before  yields  the  matrix  in  Fig.  ig. 


i 

I 


Fig.  18 


Operating  upon  this  matrix,  the  procedure  unfolds  as  follows 
J = 1. 

Cl  = C2  = 1.26,  - 0.92. 

ICSJ  = ICS^I  = 2.26,  |cS3^4j|  = = 1.92. 

Cq  = max  (1.26,  0.92)  = 1.26 

since  0.8*1.26  = 1.008  > 0.92,  Oi  and  o^  are  selected. 

|kSi1  = IkS^I  = 1.02  (<2.26) 

0=0,  (arbitrarily) 
r 1 

J »=  1, 

|kSi1  < IcSil  , 

KESU(l)  = tOi'°3(4)^ 

O “ ^'°5(6)^ 


-61- 


7)  Delete  rows  and  columns  corresponding  to  o,  and  o 

1 3 (4) 

*'  '2  ■ '5(6)  ■ O-”'  l‘:=2l  ■ l“5(6)l  ■ 

2)  C-  = 0.75,  and  o_  and  o.,^.  are  selected. 

0 2 5(6) 

Since  k = n here,  J = 2 and 
KESU(2)  . , 

^lnd  the  procedure  terminates. 

Note  that  the  obtained  result,  taking  into  account  the  preliminary 
clustering  performed  as  a byproduct  while  computing  the  normalized  dissimilarity 
matrix  of 
partition 


Fig.  10,  coincides  with  the  result  in  (A)  above,  so  that  the  best 
for  the  graph  in  Fig.  6 is  again  identified. 


■4 


-62- 


6.  - Other  approaches  and  problems. 

Section  5 assumed  that  the  graph  under  decomposition  was  "undirected" 
(i.e.,  its  adjacency  matrix  was  symmetric).  When  the  linJes  of  a graph  bear  a 
certain  direction,  the  graph  is  called  "directed,"  or  sometimes  "digraph." 

Note  that  in  the  context  of  our  problem  a directed  graph  would  mean  that  a certain 
requirement  may  be  related  to  another,  but  that  the  opposite  is  not  necessarily 
true.  It  turns  out  tliat  the  problem  of  digraph  decomposition  is  easier  to 
solve  than  the  general  graph  decomposition  problem.  Several  effective  heuristic 
approaches  liavc  been  proposed  for  this  purpose  (see,  for  instance,  [Haralick  74] 
or  [Boesch  & Gimpel  77]). 

A formal  treatment  of  the  digraph  decomposition  problem  can  be 
found  in  [Kevorian  & Snoelc  71].  They  propose  a methodology  that  decomposes 
digraphs  in  a hierarchical  fashion;  the  resulting  hierarchy  explicitly  points 
out  how  the  identified  subgraphs  interact,  as  a consequence  of  the  "precedence" 
characteristic  implicit  in  a digraph.  In  addition,  a very  nice  correspondence 
is  shown;  the  digraph  decomposition  problem  is  equivalent  to  the  decomposition 
of  a set  of  objects  characterized  by  a number  of  attributes  in  the  following 
way; 

(a)  The  number  of  objects  is  the  same  as  the  number  of  attributes,  and 

(b)  Objects'  respresentations  in  terms  of  attributes  ta)ce  the  form  of 
binary  vectors  (i.e.,  in  a sense,  an  attribute  is  either  "relevant"  or 
not  to  a given  object) . 

Limitation  (a)  ma)ces  this  approach  unsuited  to  our  problem,  but 
(b)  suggests  a way  of  assessing  interdependencies  among  objects  (system 
requirements  in  our  case)  that  may  txj  worth  investigating. 


-63- 


7.  - Summary  and  irr.plications. 

The  discussion  above  has  described  some  cluster  analysis  and  graph 
decomix)sition  techniques,  and  has  pointed  out  strategies  that  make  possible 
the  application  of  the  former  to  graph  decomposition  and  vice  versa.  In  this 
section  we  summarize  those  techniques  and  strategies  and  conunent  on  their 
application . 

The  discussion  in  sections  4 and  5 can  be  summarized  as  follows: 

1)  It  is  possible  to  put  a graph  decomposition  problem  in  the  framework  of 
cluster  analysis.  Since  many  cluster  analysis  techniques  exist,  this  can  bo  useful 

to  solve  graph  decomposition  problems. 

2)  Certain  heuristics  that  are  useful  in  graph  decomposition  have  a clear 
counterpart  in  cluster  analysis.  In  particular,  they  can  be  employed  to 
identify  initial  partitions  in  non-agglomerative  cluster  analysis  methods. 

3)  Normalized  similarity  (or  dissimilarity)  matrices  can  be  derived  for 
both  graph  decomposition  and  cluster  analysis  problems.  In  addition,  the 
resulting  matrices  display  the  structure  of  the  decomposition  problem  better 
than  the  original  data  (e.g.,  "collapsable"  objects  are  made  apparent). 

4)  Agglomerative  cluster  analysis  algorithms  are  usually  fast  but  make 
early  commitments  that  are  never  revised  and  which  can  lead  to  undesirable 


results. 


Use  Kevorian’s 
approach (6) 


Decomposition  ^ 
problem  -J 


Put  problem 
in  cluster 
analysis 
form  (norma- 
lize) 
(5.3.2) 


Use  agglomerative 
techniques  to  get 
initial  partition 
(4.1) 


Pig.  ] 


I 

t 


-66- 


I 

t 

! 

i 


If  a normalization  route  is  taken  we  advocate  the  use  of  the  heuric^ 
tic  approach  described  in  sections  5.3.1  and  5.3.3  to  identify  an  initial 
partition,  the  reason  being  that  some  computations  needed  for  the  normalizaticj 
procedure  can  also  be  used  for  partitioning  purf^oses. 

The  output  of  Fig.  19  will  be  a decomposition  of  the  original  set 
of  objects.  In  [Andreu  & Madnick  77]  we  proposed  to  interpret  the  result 
intuitively.  A more  foirmal  approach,  which  is  possible  if  the  initial 
problem  originated  from  a characterization  of  each  object  by  mecins  of  a set 
of  attributes,  is  to  analyze  which  of  these  attributes  are  predominant  in 
each  of  the  subsets,  using  factor  analysis. 

With  regard  to  a methodology  for  tlie  assessment  of  interdeponde.ncie  ! 

among  pairs  of  objects  (requirements),  it  seems  that  working  with  a meaningfui 

set  of  attributes  gives  wide  choices  with  respect  to  applicable  deconposition 

niques.  In  addition,  proceeding  this  way  would  reduce  the  number  of  assessmen 

2 

be  made  (na  as  opposed  to  n , where  n^  is  the  number  of  objects  and  a^  the 
number  of  attributes;  we  are  assuming  n > a) . Significant  improvement  over  a 
simple  yes/no  assessment  would  be  achieved  even  with  the  sinplest  attribute- 
oriented  approach  (like  assessing  only  the  relevance  or  non-relevance  of 
each  attribute  for  each  object) . 


I-l 


APPENDIX  I 


For  the  discussion  in  this  Appendix,  we  assume  that  the  graph  (0,A) 
is  connected  (i.e.,  there  are  no  disjoint  subgraphs).  This  assumption  is  reason- 
able because  if  tliere  were  disjoint  subgraphs  they  could  be  treated  separately, 
i.e.,  they  would  be  completely  independent  of  one  another  for  our  purposes. 

With  this  provision,  we  show  that  if  we  define 


s 

. . = Number  of 

13 

links 

then  it 

is 

true  that: 

a) 

s . . 

= 0 ^ i = j 

jf 

b) 

s . . 
ID 

10  Vi,j, 

c) 

s . . 
ID 

= s . . V i,  j , 

and 

d) 

s . . 
^D 

< s + s,  . V 

— ik  k] 

i, j,k. 

We  proceed  as  follows: 

a) 

It 

is  obvious  that 

s . . = 

1 = 1 


s.  . = 0 Vi,  j . 
13 


Furthermore,  s, . = 0 implies  that  we  can  travel  from  node  o.  to  node  o. 

11  11 

without  traversing  any  link.  This  is  only  possible  if  i = j.  So, 

= 0=$>i  = j Vi,j, 

which  completes  the  proof. 

b)  s. , >0  follows  immediately  from  the  definition  of  s. . (i.e.,  "number 
of  links"  can  not  be  negative) . 

c)  Since  the  adjacency  matrix  A is  symmetric,  so  is  S: 
s.,  ® s.,  Vi,j. 

13  31 


1-2 


d)  To  show  that 

s..<s.,  +s.  ,Vi,j,k, 

13  — ik  k3 

we  consider  two  cases: 

i)  o,  is  in  some  minimum  path  from  o.  to  o . . 

k 13 

If  this  is  the  case,  since  both  s.,  and  s,  . are  minimum  path  lengths, 

ik  k3 

there  is  no  better  way  of  going  from  o^  to  o^  than  going  from  o^  to  Oj^  in  s^j^ 

steps,  then  from  o,  to  o . in  s,  . steps,  i.e.: 
k 3 k3  " 

s.  . = s.,  + s,  . . 

13  ik  k3 

ii)  o,  is  not  in  a minimum  path  from  o.  to  o . . 

k ^13 

This  means  that  s.,  + s,  . is  the  length  of  a path  from  o.  to  o . which 
ik  k3  13 

is  not  minimum,  i.e.: 


s . . < s . , + s,  . . 
13  ik  k3 


ll-l 


APPENDIX  II 


Given  three  nodes  o.,  o.  and  o C O,  with  associated  core  sets  CS  , CS 

t J K i j 

and  CS  , wc  defined  s, s.,  and  s,  . as  follows: 

K ij  ak  kj 


s.  . = 1 - 
ID 


CS.  n CS  . 
^ D 

IcS,  U CS  . 
1 D 


|cs  ncs  I 

s..  =.l , and 

l“i 


s.  . = 1 - 
kD 


|CS^  n CS. 
|CS|^  UCS. 


Here,  we  show  that  these  quantities  meet  conditions  2,  3 and  4 of  section 
3,  i.e.,  that 

a)  s.  . > 0 Vi,j, 

b)  s..  = s,.  Vi,j,  and 

1]  31 

"ij  - "ik 

We  proceed  as  follows: 

a)  Since  for  any  pair  of  sets  CS^  and  CS^  it  is  always  true  that 


CS.  U CS.  > IcS.  n CS. 
‘ 1 3 ' - ' i 3 


(1) 


it  is  obvious  that 


IcS.  r\  CS.  I 

^11,  Vi, 3. 

I CS  . VJ  CS  . I 

'1  3 ' 


Thus,  s^j  ^ 0 Vi,j. 


Furthermore,  since  |CS^  O CS^ | >0^*^  for  any  i,j,  (1)  also  implies  that 
®ij  1 1 Vi,  j. 


(*)  — — 

Throughout  the  discussion  in  this  Appendix,  it  is  useful  to  keep  in  mind  that 

|xl  ^0,  whatever  the  set  X is. 


II-2 


b)  Since  both  set  union  and  intersection  are  commutative  operations,  it 

follows  that 

s..  = s..  Vi,j. 

11  11 

c)  To  show  that 


5.  . < s.,  + s,  . Vi,  j,k, 
13  — ik  k]  ■’ 


(2) 


we  will  consider  three  cases  that  cover  all  the  possible  values  for  s^^  (0  ^ s^^  £ 1, 

see  (a)  above). 

i)  s.  . = 0. 

11 

Since  s.,  , s,  . <0  (see  (a)  above),  (2)  follows  immediately  in  this 
ik  kj  — 

case . 


ii)  s. . =1. 

11 

By  definition  of  s. this  implies  that 
11 

IcS.  ^ CS  . I = 0,  i.e.  , CS.  n CS  . = (|),  so  that 

'1  1 ' 1 1 


IcSiA  csj^l  + IcSj^n  CS^I  < IcSj^l 


(because  since  CS . and  CS , are  disjoint,  elements  in  CS  can  only  belong  to 

1 1 ^ 

either  CS . or  CS , ) . 

1 — 3 

Then,  s,,  + s,  . <1  (=  s. .)  is  impossible,  because  it  implies 
ik  kj  13 


1 - 


|cs.  n csj^l 
IcSiU  cSj^l 


+ 1 


CS,  DCS. 
' ^ 1 

Ics,  Ucs. 

' k 3 


|cs  D CS  I Ics  n CS  I 
± K 3_  > 1 

IcSiUCSkI  IcSj^UCSjl 


< 1,  or 


(3) 


(4) 


If  we  now  realize  that 


|CS^  U CSj^l  > IcSj^l  and 

|cs.  ucsj  > |cs.  I , 


. (5) 


it  follows  that  if  (4)  holds,  so  does 


II-3 


which  contradicts  (3) . 


II-4 


6)  IcSj^l  < |cs.  u cs  |. 


This  subcase  requires  a more  careful  proof.  The  following  arguments 
simplify  it: 

- s. . > s.,  + s,  . cannot  hold  unless  CS,  has  elements  in  coomon  with  both 

13  ik  kj  k 

CS.  and  CS . . To  see  this,  recall  that  by  definition  of  s.,  , CS . A CS,  = d>  s., 
1 j ■'  ik  1 k ^ ik 

so  that  since  s^^  _<  1 the  above  inequality  is  impossible  regardless  of  the  value 

of  s,  . . Therefore,  we  need  only  to  consider  the  case  in  which  CS . A CS,  ^ d>  and 
kj  1 k 

CS,  A CS.  (t). 

- Zf  s . . > s . , +s,  . does  not  hold  when  CS,  vj  (CS.  vj  CS . ) = CS . O CS . (i.e, 

13  ik  k]  k 1 3 13 

when  CS,  has  no  elements  outside  CS  W CS . ) , it  will  not  hold  when  CS,  has 
k 3 k 

elements  outside  CS^  Vj  this,  let 

Csj^  represent  the  part  of  CS^^  inside  CS^  CS ^ , and 


CS,  represent  the  part  of  CS  outside  CS.vJ  CS . , i.e.: 


csj  u cs^  . cs^, 


cs^  A cs^  = 4>, 


CS,  U (CS  . O'  CS  . ) = CS  . u CS  . , and 
k 13  13' 


CS,  A (CS.  O'  CS.)  = d). 
k 13^ 


Then,  it  is  apparent  that 

|cSi  O'  CSj^l  = [cs.  O)  CS^I  + Ics^l, 
IcSj^O'  cs.\  = |csj  ucs^l  + |cs^|, 

IcS^nCSj^l  = |CS^.ACS^|,  and 
jcs  n CS  1 = Icsjn  CS  1, 


so  that  (7)  can  be  written  as  follows: 


II-5 


|cs.  n cs^l 


|cs^  n cs 


IcSiUcsJl  + |cs^|  Ics^Ucs  I + |cs;^ 


> 1+a, 


from  where  we  sec  that  if  does  not  hold  for  |csj^|  = 0,  it  certainly  won't  hold 
for  |cSj^|  > 0. 

Consequently,  in  what  follows  we  limit  our  attention  to  proof  that  (7) 


is  impossible  when 

CS  n CS.  j>!  tp,  CS  n CS . p 


and  CS,  U (CS  . CS  . ) = CS  . \J  CS  . . 

k 1 3 1 : 


To  do  this,  we  rewrite  (7)  as  follows: 


|cs  A CS  I |cs  n cs  I 

+ =!—  > 

Ics.  UCSj^l  Icsj^ocs.  I 


|CSJ  + |CSj 
|cs.  U CS.\ 


and  consider  the  following  cases: 

I)  |CS  I = |CS.  U cs. I - n,  n > 0, 

K 1 3 

where  the  n elements  belong  to  CS^ H CS^  (recall  that  since  we  assume  a > 0, 
CS.  A CS . is  non-empty). 

The  situation  is  depicted  in  Fig.  AII-1. 


CS^  n cs^ 


n elements 


Fig.  AII-1 


■4 


II-6 


In  this  case,  (10)  would  read 


(cs^j[cs.-x:rc?^:f-  n|cs.  vJCS.[  + |cs  . [Icsr  u CS^  - n|cs.  UCS.j  > 
[cS^J-lcSi  - n|cs.  i + - n|cs.  j , 

i.e. : 


where  the  n.  elements  belong  to  CS . but  not  to  CS . O CS . , and  the  n.  belong  to  CS 
1 1 1 3 3 J 

but  not  to  CS . n CS . . 

1 3 

The  situation  is  depicted  in  Fig.  AII-2. 


CS . n CS . 


elements  elements 


Fig.  AII-2 


■4 


II-7 


In  this  case,  (10)  would  read 


|CSJ  - n. 


cs . 

' 1 


n . 


CS.  u CS.  - n.  CS.  U CS.  - n. 


|CS.|  + |CS. 

|cs.  u cs.\ 


(11) 


(note  that  this  expression  reduces  to  the  previous  case  when  n^  = n J . 

Expanding, 

{ IcS.  I - n.  }{  IcS.  U CS  . 1 - n.  } IcS  . U CS  . I + { |CS  . I - n . } { Ics . Vj  CS  . I - n . } Ics.  U CS  , 

'1'  1*1  3*  1*1  3'  j ji  3 I 

{ Ics.  I + Ics . I }{ Ics.  u CS . I - n . }{  |cs.  u cs . I - n.  }. 

'1'  'j''i  j'  j'l  j'  1 


For  convenience,  let 


|CS^ 1=1,  |CS^ 1 = J,  |CS^  U CS^ I = A 
We  then  have : 


2 2 2 2 2 2 
lA  - n . lA  - n.A  + n.A  + JA  - n.JA  - n .A  + n.A  > 
111  : 1 1 

2 2 
lA  - n.IA  - n.IA  + n.n.I  + JA  - n.JA  - n.JA  + n.n.J, 

1 : 11  1 1 13 


(12) 


i.e. 


n.A(J  + n.  - A)  + n.A(I  + n.  - A)  - n.n.(I  + J)  > 0 

11  3 3 13 


(13) 


By  (12),  it  is  now  apparent  that  (see  Fig.  AII-2  and  recall  that  we 
assumed  that  neither  n.  nor  n.  belong  to  CS.  n CS  . ) 

1313 

J+n.  -A=  |CS.|  +n.  - ICS.UCS.I  <0, 

1 3 1 '1^3'— 

I + - A = |CS^|  + n^  - jcs^  y CS^ | £ 0,  and  obviously 

-n.n. ( Ics. 1 + ICS . 1)  <0, 

so  that  (13),  and  hence  (11)  and  (7),  are  impossible. 


n.  , n . , n > 0, 
1 ] 


III)  Ics,  I = Ics.  U CS . I - n.  - n . - n, 

where  n,  and  n.  are  the  same  as  before  and  the  n elements  belong  to  CS.  n CS. . 
i j 1 3 


■4 


II-8 


The  situation  is  depicted  in  Fig.  AII-3. 


cs . n CS 


Fig.  AII-3 


In  this  case,  (10)  would  read 


I CS . U CS  . I - n . I CS . U CS . I - n . I CS . U CS . I 

3'  3'i  3'  1 '1  3' 


Comparing  this  equation  with  (11) , since  n > 0,  it  is  obvious  that  if 
(11)  does  not  hold  neither  will  this  one.  This  completes  the  proof  for  subcase  B- 


III-l 


APPENDIX  III 


Here  we  show  that  the  distances  d. . defined  as 

ID 


d. . = 1 - 
ID 


5.*®S..  + S . . OS . ^ - S..«S... 

1*1*  j*  D 1*  D* 


where 


’i**®-*  = [min(s.,  , s )]■ 

1*  1*  k=l  ik  ik  ' 


and  whore  the  are  entries  in  a similarity  matrix  with  the  properties 

discussed  in  section  5.3.2,  meet  the  matric  conditions  of  section  3. 

We  proceed  as  follows: 

* 

a)  d^j  = 0 implies  that  (by  (1)  and  (2)) 


kil  f"'i"("ik'  "ik’l"  = k^l  "L  ^ k^l 


jk  k^ 


[min(s^j^,s  ) ] ' , i.e.  , 


V 2 " 2 

2 k=l  ^ik  k=l  ^jk 


which  in  turn  implies 


®ik  ®jk  ' ' 


for  otherwise 


n 2 ” 2 

» 2 ^ k=l  ^ik  ^ k=l  ^jk 
kil  "’""^•"ik'^jk’  ^ ^ 


Thus, 


d.  . = 0 ^s.*  = s.*  . 
iD  ^ 1*  ]* 


As  in  the  graph  case,  it  may  happen  that 


<1 


III-2 


d . . = 0 with  i ^ j , 

13 

but  we  take  care  of  this  circumstance  by  collapsing  objects  o^  and  o.,  so  that 
for  all  practical  purposes. 


d. . = 0 =>  i = j 
11 


Furthermore,  if  i = j obviously  d^^  = 0.  This,  together  with  (4)  yields 


dij  = 0 i = j (collapsing  nodes  if  necessary) 


b)  Equation  (3)  readily  implies  that 


d^j  > 0 if  s^*  ^ Sj*  ,i.e.,  by  (5), 


d , . >0  i , j • 
11  - 


Furthermore,  it  is  apparent  that  s ®s  > 0,  so  that 

i " *1  * — 


Ijr  j , 


d..  <1  i,j. 


c)  By  definition  of  a (see  (2)  above) , and  since  "min"  is  a commutative 
operation,  it  follows  that 


d.,  = d..  Vi,j. 

11  11 


d)  To  show  that 


we  can  follow  exactly  the  same  strategy  as  in  point  c)  of  Appendix  II.  We  will  not 
repeat  it  all  here;  we  will  only  show  that  the  same  inequalities  employed  there 
hold  true  here.  (In  what  follows,  we  refer  to  equation  i of  Appendix  II  by  Il-i) : 


III-3 


Situation  II-3,  in  the  new  notation,  would  read 

* * > . . . . 

1*  k*  k*  j*  — k*  k* 


i.e.,  by  (2), 

" 2 " 2 2 
Z [min(s  ,s  )]  +5,  [min(s  , s.  )]  < s, 

P=1  ip  kp  p=l  kp  3p  — p=l  kp 

Since  obviously 

2 2 

min  £ ^)^p  ^ ifk,p  , because  we  assume  s^^^  ^ 0, 

(6)  holds  true. 

- Equation (s)  II-5  would  read 


2 ”2 

Z [min(s,  ,s  )]  '>  Z,  s,  V i,k  , 

'=1  ip  kp  — p=l  kp 


p=i  ■ ip  kp 

which  by  (7)  is  also  true. 


(6) 


(7) 


- Ekjuation  II-8a  would  read 


n ^ n 

Z [min(s.  ,s  )]  + Z, 

P=1  ip  kp  p=l 


2^2  2 
[min(s  ,s.  ))  < Z_  s^  + Z, 

kp  DP  - P=1  ip  p=l  DP 


which  also  by  (7)  holds  here,  since  A £ C and  B £ D. 

- The  proof  corresponding  to  subcase  6 in  Appendix  II,  finally,  can  be 
followed  step  by  step  in  the  generalized  setting. 


rv-1 


APPENDIX  IV 


Here  we  show  a very  simple  way  of  converting  a dissimilarity 
matrix  D(d^^)  into  a similarity  one  S(s^J.  By  simply  letting 


where  A = max  CD), 


the  entries  in  S will  behave  as  similarity  coefficients.  In  particular,  if 
the  entries  in  D are  metrics  (see  section  3) , then  the  entries  in  S will  have 
the  following  properties: 


1) 

S . . 

= 

A iff  i = 

ID 

2) 

s . . 

_ 

s . . Vi, 

ID 

Di 

3) 

s . . 

> 

0 V i,j'| 

ID 

1 

l> 

4) 

s.  . 

< 

A V i, jj 

ID 

ID 


ID  Di 


because  of  the  definition  of  A. 


REFERKNCES 


(Anderberg  73]: 

Anderberg,  M.  R.:  "Cluster  Analysis  for  Applications,"  Academic 

Press,  New  York,  1973. 

[Andreu  & Madnick  77): 

Andreu,  R.  C.  and  S.  E.  Madnick:  "A  systematic  approach  to  the 

design  of  complex  systems:  application  to  DBMS  design  and  evaluation," 

CISR  Report  No. 32,  Sloan  School  of  Management,  M.I.T.,  March  1977. 

[Astrahan  70] : 

Astrahan,  M.  M.:  "Speech  analysis  by  clustering,  or  the  hyperphoneme 

method,"  Stanford  Articifial  Intelligence  Project  Memo  AIM-124,  AD 
709067,  Stanford  University,  1970. 

(Boosch  & Gimpel  77]: 

Boosch,  F.  T.  and  J.  F.  Gimpel:  "Covering  the  points  of  a digraph 

with  point-disjoint  paths  and  its  application  to  code  optimization," 
Journal  of  the  Association  for  Com[)uting  Machinery,  24-2,  April  1977. 

(Choffray  77] : 

Choffray,  J.  F.:  Unpublished  doctoral  thesis,  M.I.T.  Sloan  School 

of  Management,  April  1977. 

[Curry  76] : 

Curry,  D.  J.:  "Some  statistical  considerations  in  clustering  with 

binary  data,"  Multivariate  Behavioral  Research , April  1976. 


[Deo  74] : 

Deo,  N. : "Graph  Theory  with  Application  to  Engineering  and  Computer 

Science,"  Prentice  Hall,  New  York,  1974. 

[Freeman  76] : 

Freeman,  P.:  "The  context  of  design,"  in  "Tutorial  on  Software  Design 

Techniques,"  P.  Freeman,  ed. , IEEE  Catalogue  No. 76CH1145-2C. 

[Haralick  74] : 

Haralick,  Robert  M. : "The  diclique  representation  and  decomposition  of 

binary  relations,"  Journal  of  the  Association  for  Computing  Machinery, 
21-3,  July  1974. 

[Hartigan  75] : 

Hartigan,  J.:  "Clustering  Algorithms,"  Wiley  Interscience,  1975. 

[Hubert  74] : 

Hubert,  L.  J.:  "Some  applications  of  graph  theory  to  clustering," 

Psychometrika  39  (September  1974) . 


■i 


REFERENCES  [cont’d] 


[Kevorian  & Snoek  71]: 

Kovorian,  A.  K.  and  J.  Snoek:  "Decomposition  in  large-scale  systems: 

theory  and  applications  of  structural  analysis  in  partitioning, 
disjoining  and  constructing  hierarchic  il  systems,"  in  "Decomrxjsition 
of  Large-Scale  Systems,"  D.  M.  Himmel  lau,  ed.,  North-Holland 
Publishing  Co.,  1973. 


